Google pushes “text fragment links” with new Chrome extension

New feature can deep-link to specific text on a Web page, with highlighting.

Google has been cooking up an extension to the URL standard called “Text Fragments.” The new link style will allow you to link not just to a page but to specific text on a page, which will get scrolled to and highlighted automatically once the page loads. It’s like an anchor link, but with highlighting and creatable by anyone.

The feature has actually been supported in Chrome since version 80, which hit the stable channel in February. Now a new extension from Google makes it easy to create this new link type, which will work for anyone else using Chrome on desktop OSes and Android. Google has proposed the idea to the W3C and hopes other browsers will adopt it, but even if they don’t, the links are backward-compatible.

The syntax for this URL is pretty strange looking. After the URL, the magic is in the string “#:~:text=” and then whatever text you want to match. So a full link would look like this:

https://en.wikipedia.org/wiki/Cat#:~:text=Most breeds of cat have a noted fondness for sitting in high places

If you copy and paste this into Chrome, the browser will open Wikipedia’s cat page, scroll to the first text that matches “Most breeds of cat have a noted fondness for sitting in high places,” and will highlight it. If the text doesn’t match anything, the page will still load. Backward-compatibility works because browsers currently support the number sign (#) as a URI fragment, which usually gets used for anchor links that are made by the page creator. If you paste this into a browser that doesn’t support it, the page will still load, and everything after the number sign will just be ignored as a bad anchor link. So far, so good.

One problem is that this means you can have spaces in a URL. On a webpage or forum, you can hand-code the link with a href tag (or whatever the non-HTML equivalent is) and everything will work. For instant messengers and social media though, which don’t allow code and use automatic URL parsers, things get a bit more complicated. Every URL parser treats a space as the end of a URL, so you’ll need to use percent-encoding to replace all the spaces with the equivalent “%20.” URL parsers now have a shot at linkifying this correctly, but it looks like a mess:

https://en.wikipedia.org/wiki/Cat#:~:text=Most%20breeds%20of%20cat%20have%20a%20noted%20fondness%20for%20sitting%20in%20high%20places.

Spaces aren’t the only characters that can cause problems. The standard RFC 3986 defines several “reserved” characters as having a special meaning in a URL, so they shouldn’t be in a URL. Web-page-authoring tools tend to handle these characters automatically, but now that you’re embedding arbitrary sentences in a URL for highlighting, there’s a higher chance you’ll run into one of these reserved characters:! * ‘ ( ) ; : @ & = + $ , / ? # [ ]. They all need to be percent-encoded in order for the URL to work, and Google’s extension takes care of that for you.

Google’s new Chrome extension, called “Link to Text Fragment,” (it’s also on Github) will put a new entry in Chrome’s right-click menu. You just highlight text on a page, right-click it, and hit “Copy link to selected text.” Like magic, a text fragment link will end up on your clipboard. All the text encoding is done automatically, so the link should work with most websites and messengers.

Google seems like it is going to start pushing out support for text fragments across its Web ecosystem, even without the W3C. The links have already started to show up in some Google search results, which allow Chrome users to zip right to the relevant text. It’s probably only a matter of time before link creation moves from an extension to a normal Chrome feature.

How to OCR Documents for Free in Google Drive

Google Drive makes it painless to go paperless. Its collaborative documents, spreadsheets, and presentations already help curtail paper usage, but its OCR feature helps curb the paper mess even more.

OCR, or Optical Character Recognition, is the most important tech to help you go paperless. Scanned documents on their own are only glorified pictures of your documents, but let your computer recognize the text and they instantly become a ton more useful. We’ve already looked at how to OCR documents in Adobe Acrobat:

We’ve looked at turning PDF files into documents you can edit in Word as well:

Now if you don’t have a copy of Acrobat or Word, there’s an even better option: Google Drive. It includes a little-known free OCR tool that is a powerful, easy to use image to text converter.

In this tutorial, we’ll look at what is Google Drive’s OCR process and simple steps to begin working with it. I’ll show you how to use Google Drive to quickly convert your scanned images and PDF documents into editable text files online.

Google Cloud: Optical Character Recognition (OCR) Tutorial

Learn how to perform optical character recognition (OCR) on Google Cloud Platform. This tutorial demonstrates how to upload image files to Google Cloud Storage, extract text from the images using the Google Cloud Vision API, translate the text using the Google Cloud Translation API, and save your translations back to Cloud Storage. Google Cloud Pub/Sub is used to queue various tasks and trigger the right Cloud Functions to carry them out.

Objectives

  • Write and deploy several Background Cloud Functions.
  • Upload images to Cloud Storage.
  • Extract, translate and save text contained in uploaded images.

Visualizing the flow of data

The flow of data in the OCR tutorial application involves several steps:

  1. An image that contains text in any language is uploaded to Cloud Storage.
  2. A Cloud Function is triggered, which uses the Vision API to extract the text and detect the source language.
  3. The text is queued for translation by publishing a message to a Pub/Sub topic. A translation is queued for each target language different from the source language.
  4. If a target language matches the source language, the translation queue is skipped, and text is sent to the result queue, another Pub/Sub topic.
  5. A Cloud Function uses the Translation API to translate the text in the translation queue. The translated result is sent to the result queue.
  6. Another Cloud Function saves the translated text from the result queue to Cloud Storage.
  7. The results are found in Cloud Storage as txt files for each translation.

It may help to visualize the steps: