I use Calibre for managing my ebooks that I’ve accumulated over time. I thought it would be great to have ratings from GoodReads.com directly visible in Calibre. It would help me decide what to read next. I specifically wanted ratings from GoodReads as I trust them more than the ratings from Amazon.
I took a peek in the Calibre plugins and found that it does have a plugin for GoodReads. That’s great! I installed and used it but it rounds off the ratings that it gets from GoodReads down to integer values. That’s not good. There is a pretty significant difference in rating between 3.55 and 3.98. I looked at the options within the plugin and on the internet in general for a way to store ratings in decimal values but no luck.
I looked up where the Calibre plugins were stored (LocalUser\AppData\Roaming\calibre\plugins) and I found the plugin in there as a zip file. Opening the zip file showed that it contained python scripts which can be easily edited. Well this looks promising!
The field where the ratings are being stored turned out to be an integer field so I couldn’t push the rating decimal values in there. I decided to use the publisher field instead as I don’t much care to know who the publisher for the book is. By editing the py script file within the plugin zip file I was able to push the GoodReads ratings in decimal values in the publisher field. (The changes in code are described below).
The ratings in decimal values alone weren’t enough. I mean there is still a huge difference between a rating of 4.7 with 10 ratings and a rating of 4.2 with 20,000+ ratings. So I edited the script further to add the ratings counts next to the decimal ratings as well separated by a space. Figuring out the XPath to read the ratings gave me a bit of a headache. But it’s a good thing I had python installed with Visual Studio. I spun up a python web scrapping project, installed the required dependencies through python pip package manager, and did some experimentation on the GoodReads book details page to figure out the right way to get the ratings count value.
Here are the changes I made to the worker.py file:
# Created a copy of the method parse_rating and named it as parse_rating_withcount # within the new method, add the following below rating_node = root.xpath ... rating_count = root.xpath('//*[@itemprop="ratingCount"]/@content') # in the try block add the line rating_text = rating_node.text + " " + rating_count # replace return rating value with return rating_text # Now we'll use the new method # Add the following below the line: mi.publisher, mi.pubdate = self.parse_publisher_and_date(root) mi.publisher = self.parse_rating_withcount(root) # All done!
Now the ‘Download Metadata and Covers’ option populates the publisher field with ratings and the ratings count. E.g. 4.51 93087.
Remember that Calibre search supports using regular expressions. For example, to show all books with:
- Rating greater than 4 with at least 10,000 reviews use the search string:
- publisher:”~^4.\d\d[ \r\n]*\d\d\d\d\d”
- Ratings greater than 4.3 with at least 1,000 reviews with tags Humor or Humour use the search string:
- tags:”~Humo” AND publisher:”~^4.\d[ \r\n]*\d\d\d\d\d”
The ‘Extract ISBN’ and ‘Clean Metadata’ plugins are great for further cleanup.
Text Search within ePub eBooks
Most of the books that I have are in the ePub format and I wanted a way to search through the text of the books. I found EpubSharp to be a great C# library for reading text contents from the ePub files. The good thing about EpubSharp compared to another library I checked out was that if I nested the calls to EpubSharp within a try block it would help ignore bad ePub files. Whereas the other library was throwing exceptions and halting the execution.
After an appropriate ePub reading library was found, the implementation of the search functionality was just a quick project in Visual Studio.
What remains now is to actually make use of the library and read more!