I walked past my server, but I noticed something amiss: the console was spitting out read errors for my hard drive. So, I shut the whole server down (which is why the server was down for the past week). I got a new Samsung 870 EVO 1TB for Christmas, then installed it and copied over the data. However, some data was unreadable on the disk. Most of it was inconsequential, such as old journal files. Some of it was important, such as part of my Postgres database. Uh oh.


Identifying the Problem

Problem number one is that I did not have backups. Lesson learned.

As some of you can guess, I'm a big fan of RSS. I run my own FreshRSS instance, which is backed by my Postgres database. As I logged on to view what I had missed over the past week, nothing showed up. SSH-ing to my server, opening the psql shell, and running a select * from fr_me_entry gave me an error: ERROR: could not read block 912 in file "base/16504/278344": read only 0 of 8192 bytes.

Step One: Getting As Much of the File Back

First things first, I shut down the Postgres server. Then, I made a full backup of the current state. However, some files did not copy over from the old disk to the new disk properly. So, step one was to use ddrescue to get as much of the broken file as possible from the old hard drive.

root@server:/mnt/lib/postgresql/14/main/base/16504# ddrescue 278344 ~/278344
GNU ddrescue 1.23
Press Ctrl-C to interrupt
     ipos:    7642 kB, non-trimmed:        0 B,  current rate:       0 B/s
     opos:    7642 kB, non-scraped:        0 B,  average rate:   69792 B/s
non-tried:        0 B,  bad-sector:    32768 B,    error rate:     256 B/s
  rescued:   12492 kB,   bad areas:        4,        run time:      2m 59s
pct rescued:   99.73%, read errors:       67,  remaining time:          0s
                              time since last successful read:         35s
Finished

Great! I got most of the file!

I replaced the broken file in Postgres with the recovered file, crossed my fingers, and ran select * from fr_me_entries.

ERROR: unexpected chunk number 2 (expected 0) for toast value 298148 in pg_toast_278344

Ugh.

Step Two: Fixing The Database

After scouring the internet, it seemed like the PostgreSQL Toast data in some lines were corrupted. Some more searching brought me to this lifesaver of a website. Time to remove the broken rows.

First, because I'm wonderful at following directions, I reindexed the toast table: REINDEX TABLE pg_toast.pg_toast_278344;

I knew the toasted values in the rows was the message, or the body of the RSS item. So, I used the script on the website (with a few modifications) to locate broken lines.

DO $f$
declare
    curid BIGINT := 0;
    vcontent TEXT;
    badid BIGINT;
begin
FOR badid IN SELECT id FROM fr_me_entry LOOP
    curid = curid + 1;
    if curid % 1000 = 0 then
        raise notice '% rows inspected', curid;
    end if;
    begin
        SELECT content
        INTO vcontent
        FROM fr_me_entry where id = badid;
        vcontent := substr(vcontent,1000,2000);
    exception
        when others then
            raise notice 'data for message % is corrupt.', badid;
            continue;
    end;
end loop;
end;
$f$;

I got the output of the script.

NOTICE:  data for message 1621857039614111 is corrupt
NOTICE:  data for message 1621857039614117 is corrupt
NOTICE:  1000 rows inspected
NOTICE:  data for message 1638760466010399 is corrupt
NOTICE:  data for message 1639844468285335 is corrupt
NOTICE:  data for message 1621857039614110 is corrupt
NOTICE:  2000 rows inspected
NOTICE:  data for message 1621857039614112 is corrupt
NOTICE:  data for message 1628270672577036 is corrupt
NOTICE:  3000 rows inspected
NOTICE:  data for message 1621857039614109 is corrupt
NOTICE:  data for message 1622960448368537 is corrupt
NOTICE:  data for message 1621857039614122 is corrupt
NOTICE:  data for message 1631013697483559 is corrupt
NOTICE:  4000 rows inspected
NOTICE:  data for message 1621857039614115 is corrupt
NOTICE:  data for message 1628270672577039 is corrupt
NOTICE:  data for message 1628270672577038 is corrupt
NOTICE:  data for message 1637457642679647 is corrupt
NOTICE:  data for message 1621857039614116 is corrupt
NOTICE:  data for message 1621857039614119 is corrupt
NOTICE:  5000 rows inspected
NOTICE:  6000 rows inspected
NOTICE:  7000 rows inspected
NOTICE:  data for message 1628270672577034 is corrupt
NOTICE:  data for message 1628270672577041 is corrupt
NOTICE:  8000 rows inspected
NOTICE:  9000 rows inspected
NOTICE:  10000 rows inspected
NOTICE:  11000 rows inspected
NOTICE:  12000 rows inspected
NOTICE:  13000 rows inspected
NOTICE:  14000 rows inspected
NOTICE:  15000 rows inspected

The website said to unlink the rows by updating the values to no longer use the toast values with an UPDATE command.

UPDATE fr_me_entry SET content = '' WHERE id = 1621857039614111;

Rerunning the script gave the exact same results, but without the line NOTICE: data for message 1621857039614111 is corrupt. Wonderful. I modified the script slightly to automatically unlink the information.

DO $f$
declare
    curid BIGINT := 0;
    vcontent TEXT;
    badid BIGINT;
begin
FOR badid IN SELECT id FROM fr_me_entry LOOP
    curid = curid + 1;
    if curid % 1000 = 0 then
        raise notice '% rows inspected', curid;
    end if;
    begin
        SELECT content
        INTO vcontent
        FROM fr_me_entry where id = badid;
        vcontent := substr(vcontent,1000,2000);
    exception
        when others then
            raise notice 'data for message % is corrupt. unlinking.', badid;
            update fr_me_entry set content = '' where id = badid;
            continue;
    end;
end loop;
end;
$f$;

Let's open FreshRSS.

FreshRSS Showing Unread Articles

Success. Oh wow, 1385 unread articles. This may take a while...

Previous Post Next Post