GoodReads Decimal Ratings in Calibre

I use Calibre for managing my ebooks that I’ve accumulated over time. I thought it would be great to have ratings from GoodReads.com directly visible in Calibre. It would help me decide what to read next. I specifically wanted ratings from GoodReads as I trust them more than the ratings from Amazon.

I took a peek in the Calibre plugins and found that it does have a plugin for GoodReads. That’s great! I installed and used it but it rounds off the ratings that it gets from GoodReads down to integer values. That’s not good. There is a pretty significant difference in rating between 3.55 and 3.98. I looked at the options within the plugin and on the internet in general for a way to store ratings in decimal values but no luck.

I looked up where the Calibre plugins were stored (LocalUser\AppData\Roaming\calibre\plugins) and I found the plugin in there as a zip file. Opening the zip file showed that it contained python scripts which can be easily edited. Well this looks promising!

The field where the ratings are being stored turned out to be an integer field so I couldn’t push the rating decimal values in there. I decided to use the publisher field instead as I don’t much care to know who the publisher for the book is. By editing the py script file within the plugin zip file I was able to push the GoodReads ratings in decimal values in the publisher field. (The changes in code are described below).

The ratings in decimal values alone weren’t enough. I mean there is still a huge difference between a rating of 4.7 with 10 ratings and a rating of 4.2 with 20,000+ ratings. So I edited the script further to add the ratings counts next to the decimal ratings as well separated by a space. Figuring out the XPath to read the ratings gave me a bit of a headache. But it’s a good thing I had python installed with Visual Studio. I spun up a python web scrapping project, installed the required dependencies through python pip package manager, and did some experimentation on the GoodReads book details page to figure out the right way to get the ratings count value.

Here are the changes I made to the worker.py file:

# Created a copy of the method parse_rating and named it as parse_rating_withcount
# within the new method, add the following below rating_node = root.xpath ...
rating_count = root.xpath('//*[@itemprop="ratingCount"]/@content')
# in the try block add the line
rating_text = rating_node[0].text + " " + rating_count[0]
# replace return rating value with
return rating_text
# Now we'll use the new method
# Add the following below the line: mi.publisher, mi.pubdate = self.parse_publisher_and_date(root)
mi.publisher = self.parse_rating_withcount(root)
# All done!

Now the ‘Download Metadata and Covers’ option populates the publisher field with ratings and the ratings count. E.g. 4.51 93087.

Remember that Calibre search supports using regular expressions. For example, to show all books with:

  • Rating greater than 4 with at least 10,000 reviews use the search string:
    • publisher:”~^4.\d\d[ \r\n]*\d\d\d\d\d”
  • Ratings greater than 4.3 with at least 1,000 reviews with tags Humor or Humour use the search string:
    • tags:”~Humo” AND publisher:”~^4.[3456789]\d[ \r\n]*\d\d\d\d\d”

Further cleanup

The ‘Extract ISBN’ and ‘Clean Metadata’ plugins are great for further cleanup.

Text Search within ePub eBooks

Most of the books that I have are in the ePub format and I wanted a way to search through the text of the books. I found EpubSharp to be a great C# library for reading text contents from the ePub files. The good thing about EpubSharp compared to another library I checked out was that if I nested the calls to EpubSharp within a try block it would help ignore bad ePub files. Whereas the other library was throwing exceptions and halting the execution.

After an appropriate ePub reading library was found, the implementation of the search functionality was just a quick project in Visual Studio.

What’s Next

What remains now is to actually make use of the library and read more!

6 thoughts on “GoodReads Decimal Ratings in Calibre”

  1. Could you show the full code for this – not an expert in py

  2. I dug into the code, and even though I don’t know python, I could see what to modify. Turns out its very easy. Its just two sections to modify.

    For others that maybe python challenges, I marked up a screen shot of what to add.

    https://www.screencast.com/t/zR1RFUe5

  3. Really great work! I’m completely lost with modifying plugins. I’ve been trying to find a way to populate a custom column with the number of reviews on Goodreads–do you think this method would work?

    1. It won’t work with a custom column. I used the existing ‘publisher’ column and left out populating the publisher information.

  4. Got it. What about just substituting the number of ratings with the number of reviews?

    1. This part of the code will have to be changed to get the review count instead of ratings count: root.xpath(‘//*[@itemprop=”ratingCount”]/@content’)

      You’d have to read up on XPath and see the html source of any book page on GoodReads to figure out what XPath to use to get the review count (if that information is present on the book pages).

Comments are closed.