Moving from liferea to newsbeuter
Tags: [liferea] [newsbeuter] [python]
Published: 24 Feb 2016 08:54

newsbeuter is a mutt-like reader for RSS feeds. I’ve been using liferea for a while now but just found out about newsbeuter. I immediately wanted to switch - I enjoy using mutt a lot, and wanted an equivalently good solution for reading RSS feeds.

I had a lot of cached RSS content in liferea though, and wanted to move all of it from liferea’s db to newsbeuter’s db.

First, I looked at liferea’s db. ./.local/share/liferea/liferea.db is a sqlite database, and .schema will dump the tables in this database. The table I’m interested in is items:

$ sqlite3 liferea.db
> .schema
CREATE TABLE items (
        item_id      INTEGER,
        parent_item_id INTEGER,
        node_id      TEXT,
        parent_node_id     TEXT,
        title       TEXT,
        read        INTEGER,
        updated INTEGER,
        popup    INTEGER,
        marked       INTEGER,
        source      TEXT,
        source_id       TEXT,
        valid_guid INTEGER,
        description  TEXT,
        date        INTEGER,
        comment_feed_id TEXT,
        comment            INTEGER,
        PRIMARY KEY (item_id));

Here is an example row from that table:

sqlite> select * from items limit 1;
193|0|gpburmi|gpburmi|A brief introduction to server-testing with
serverspec|1|1|1|0|
http://www.debian-administration.org/article/703/A_brief_introduction_to_server-testing_with_serverspec|
http://www.debian-administration.org/article/703/A_brief_introduction_to_server-testing_with_serverspec|1|<div
xmlns="http://www.w3.org/1999/xhtml"
xml:base="http://www.debian-administration.org/"><p>Serverspec is the
name of a Ruby tool which allows you to write simple tests, to validate
that a server is correctly configured.  Here we'll walk through the
installation of the tool, and show how it can be used to test a server
setup.</p></div>|1394705768||0

A little bit of python can export sqlite rows as json:

import sys
import sqlite3
import json

try:
    with sqlite3.connect("liferea.db") as con:
        cur = con.cursor()

        cur.execute("select * from items limit 1")

        for data in cur.fetchall():
            columns = [x[0] for x in cur.description]
            output = {}
            for entry in zip(columns, data):
                output[entry[0]] = entry[1]
            print json.dumps(output, indent=2)

except sqlite3.Error, e:
    print "Error %s" % (e.args[0])
    sys.exit(1)

Here’s that same row output as json:

{
  "comment": 0,
  "updated": 1,
  "popup": 1,
  "parent_item_id": 0,
  "description": "<div xmlns=\"http://www.w3.org/1999/xhtml\" xml...",
  "title": "A brief introduction to server-testing with serverspec",
  "read": 1,
  "comment_feed_id": null,
  "marked": 0,
  "source": "http://www.debian-administration.org/article/703/A_brief_introduction_to_server-testing_with_serverspec",
  "node_id": "gpburmi",
  "source_id": "http://www.debian-administration.org/article/703/A_brief_introduction_to_server-testing_with_serverspec",
  "date": 1394705768,
  "item_id": 193,
  "valid_guid": 1,
  "parent_node_id": "gpburmi"
}

.newsbeuter/cache.db is also a sqlite database:

> .schema
CREATE TABLE rss_item (
        id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
        guid VARCHAR(64) NOT NULL,
        title VARCHAR(1024) NOT NULL,
        author VARCHAR(1024) NOT NULL,
        url VARCHAR(1024) NOT NULL,
        feedurl VARCHAR(1024) NOT NULL,
        pubDate INTEGER NOT NULL,
        content VARCHAR(65535) NOT NULL,
        unread INTEGER(1) NOT NULL,
        enclosure_url VARCHAR(1024),
        enclosure_type VARCHAR(1024),
        enqueued INTEGER(1) NOT NULL DEFAULT 0,
        flags VARCHAR(52),
        deleted INTEGER(1) NOT NULL DEFAULT 0,
        base VARCHAR(128) NOT NULL DEFAULT "");

An example newsbeuter row in json:

{
  "feedurl": "https://operand.ca/feeds/rss.xml",
  "unread": 1,
  "enclosure_type": "",
  "pubDate": 1383968048,
  "author": "James MacMahon",
  "url": "https://operand.ca/2013/11/08/generating-a-blog.html",
  "title": "About me, and generating a blog with jekyll",
  "enqueued": 0,
  "content": "<h1 id=\"about-me\">About me</h1>\n<p>...\n",
  "deleted": 0,
  "enclosure_url": "",
  "flags": null,
  "base": "",
  "guid": "https://operand.ca/2013/11/08/generating-a-blog.html",
  "id": 1
}

Given these two schema, liferea columns can be mapped to newsbeuter columns like so:

feedurl = (1)
unread = not liferea.read
enclosure_type = ""
pubDate = liferea.date
author = (2)
url = liferea.source
title = liferea.title
enqueued = 0
content = lifearea.description
deleted = 0
enclosure_url = ""
flags = null
base = ""
guid = liferea.source_id

For (1) and (2), queries into the liferea database pull this information from the other tables:

To fill feedurl, grab source from the subscription table:

sqlite> select * from subscription where node_id="gpburmi";
gpburmi|http://www.debian-administration.org/atom.xml|http://www.debian-administration.org/atom.xml||-1|-1|0|0

To fill author, grab author from the metadata table:

sqlite> select * from metadata where item_id=193 and key='author';
193|2|author|Steve (<a href="http://www.debian-administration.org/users/Steve">Website</a>)

Not straight forward but it works. I decided not to port over enclosure information because I won’t be using this to download podcasts - I already have a solution for that.

That’s basically it - run the export code on the liferea database, then run the import code on the newsbeuter database. I’ve put the export and import scripts up on my github in a repo named liferea_to_newsbeuter. They’re a bit hacky but they get the job done.