Text Fragments Draft Community Group Report, 9 June 2020

Abstract

Text Fragments adds support for specifying a text snippet in the URL fragment. When navigating to a URL with such a fragment, the user agent can quickly emphasise and/or bring it to the user’s attention.

The core use case for text fragments is to allow URLs to serve as an exact text reference across the web. For example, Wikipedia references could link to the exact text they are quoting from a page. Similarly, search engines can serve URLs that direct the user to the answer they are looking for in the page rather than linking to the top of the page.

2.1.2. User sharing

With text fragments, browsers may implement an option to ‘Copy URL to here’ when the user opens the context menu on a text selection. The browser can then generate a URL with the text selection appropriately specified, and the recipient of the URL will have the specified text conveniently indicated. Without text fragments, if a user wants to share a passage of text from a page, they would likely just copy and paste the passage, in which case the receiver loses the context of the page.

This specification intentionally doesn’t define what actions a user agent should or could take to “indicate” a text match. There are different experiences and trade-offs a user agent could make. Some examples of possible actions:

  • Providing visual emphasis or highlight of the text passage
  • Automatically scrolling the passage into view when the page is navigated
  • Activating a UA’s find-in-page feature on the text passage
  • Providing a “Click to scroll to text passage” notification
  • Providing a notification when the text passage isn’t found in the page

 

3.2. Syntax

This section is non-normative

text fragment directive is specified in the fragment directive (see § 3.3 The Fragment Directive) with the following format:

#:~:text=[prefix-,]textStart[,textEnd][,-suffix]
          context  |-------match-----|  context

(Square brackets indicate an optional parameter)

The text parameters are percent-decoded before matching. Dash (-), ampersand (&), and comma (,) characters in text parameters must be percent-encoded to avoid being interpreted as part of the text directive syntax.

The only required parameter is textStart. If only textStart is specified, the first instance of this exact text string is the target text.

#:~:text=an%20example%20text%20fragment indicates that the exact text “an example text fragment” is the target text.

If the textEnd parameter is also specified, then the text directive refers to a range of text in the page. The target text range is the text range starting at the first instance of startText, until the first instance of endText that appears after startText. This is equivalent to specifying the entire text range in the startText parameter, but allows the URL to avoid being bloated with a long text directive.

#:~:text=an%20example,text%20fragment indicates that the first instance of “an example” until the following first instance of “text fragment” is the target text.

3.2.1. Context Terms

This section is non-normative

The other two optional parameters are context terms. They are specified by the dash (-) character succeeding the prefix and preceding the suffix, to differentiate them from the textStart and textEnd parameters, as any combination of optional parameters may be specified.

Context terms are used to disambiguate the target text fragment. The context terms can specify the text immediately before (prefix) and immediately after (suffix) the text fragment, allowing for whitespace.

While the context terms must be the immediate text surrounding the target text fragment, any amount of whitespace is allowed between context terms and the text fragment. This helps allow context terms to be across element boundaries, for example if the target text fragment is at the beginning of a paragraph and it must be disambiguated by the previous element’s text as a prefix.

The context terms are not part of the targeted text fragment and must not be visually indicated.

#:~:text=this%20is-,an%20example,-text%20fragment would match to “an example” in “this is an example text fragment”, but not match to “an example” in “here is an example text”.

 

Google pushes “text fragment links” with new Chrome extension

New feature can deep-link to specific text on a Web page, with highlighting.

Google has been cooking up an extension to the URL standard called “Text Fragments.” The new link style will allow you to link not just to a page but to specific text on a page, which will get scrolled to and highlighted automatically once the page loads. It’s like an anchor link, but with highlighting and creatable by anyone.

The feature has actually been supported in Chrome since version 80, which hit the stable channel in February. Now a new extension from Google makes it easy to create this new link type, which will work for anyone else using Chrome on desktop OSes and Android. Google has proposed the idea to the W3C and hopes other browsers will adopt it, but even if they don’t, the links are backward-compatible.

The syntax for this URL is pretty strange looking. After the URL, the magic is in the string “#:~:text=” and then whatever text you want to match. So a full link would look like this:

https://en.wikipedia.org/wiki/Cat#:~:text=Most breeds of cat have a noted fondness for sitting in high places

If you copy and paste this into Chrome, the browser will open Wikipedia’s cat page, scroll to the first text that matches “Most breeds of cat have a noted fondness for sitting in high places,” and will highlight it. If the text doesn’t match anything, the page will still load. Backward-compatibility works because browsers currently support the number sign (#) as a URI fragment, which usually gets used for anchor links that are made by the page creator. If you paste this into a browser that doesn’t support it, the page will still load, and everything after the number sign will just be ignored as a bad anchor link. So far, so good.

One problem is that this means you can have spaces in a URL. On a webpage or forum, you can hand-code the link with a href tag (or whatever the non-HTML equivalent is) and everything will work. For instant messengers and social media though, which don’t allow code and use automatic URL parsers, things get a bit more complicated. Every URL parser treats a space as the end of a URL, so you’ll need to use percent-encoding to replace all the spaces with the equivalent “%20.” URL parsers now have a shot at linkifying this correctly, but it looks like a mess:

https://en.wikipedia.org/wiki/Cat#:~:text=Most%20breeds%20of%20cat%20have%20a%20noted%20fondness%20for%20sitting%20in%20high%20places.

Spaces aren’t the only characters that can cause problems. The standard RFC 3986 defines several “reserved” characters as having a special meaning in a URL, so they shouldn’t be in a URL. Web-page-authoring tools tend to handle these characters automatically, but now that you’re embedding arbitrary sentences in a URL for highlighting, there’s a higher chance you’ll run into one of these reserved characters:! * ‘ ( ) ; : @ & = + $ , / ? # [ ]. They all need to be percent-encoded in order for the URL to work, and Google’s extension takes care of that for you.

Google’s new Chrome extension, called “Link to Text Fragment,” (it’s also on Github) will put a new entry in Chrome’s right-click menu. You just highlight text on a page, right-click it, and hit “Copy link to selected text.” Like magic, a text fragment link will end up on your clipboard. All the text encoding is done automatically, so the link should work with most websites and messengers.

Google seems like it is going to start pushing out support for text fragments across its Web ecosystem, even without the W3C. The links have already started to show up in some Google search results, which allow Chrome users to zip right to the relevant text. It’s probably only a matter of time before link creation moves from an extension to a normal Chrome feature.

How to OCR Documents for Free in Google Drive

Google Drive makes it painless to go paperless. Its collaborative documents, spreadsheets, and presentations already help curtail paper usage, but its OCR feature helps curb the paper mess even more.

OCR, or Optical Character Recognition, is the most important tech to help you go paperless. Scanned documents on their own are only glorified pictures of your documents, but let your computer recognize the text and they instantly become a ton more useful. We’ve already looked at how to OCR documents in Adobe Acrobat:

We’ve looked at turning PDF files into documents you can edit in Word as well:

Now if you don’t have a copy of Acrobat or Word, there’s an even better option: Google Drive. It includes a little-known free OCR tool that is a powerful, easy to use image to text converter.

In this tutorial, we’ll look at what is Google Drive’s OCR process and simple steps to begin working with it. I’ll show you how to use Google Drive to quickly convert your scanned images and PDF documents into editable text files online.