Digital Content Identification

The current copyright policies need to change in order to keep up with the transition from analog content (books, music records, printed photos, etc) to digital content (PDF, MP3, JPEG, etc), also referred to as the Digital Revolution. Digital content is created, copied, manipulated, and distributed at an unbelievable rate because technology, and more importantly, the Internet allows an unimaginable amount of content to be exchanged and released with little regard to the original authors of the content. Keeping track of all this on the Internet is generally thought to be an impossible task.

However, it makes sense that technology and the Internet should and could be used to help manage and control copyright infringement in the digital revolution. Google has developed a technology called Content ID that allows copyright holders to identify and manage their content. If Google were to share this technology with the U.S. Copyright Office it could lead to a very large scale service with the potential to automate and regulate copyright law more efficiently.

This is because every type of digital content (music, video, text, etc.) have a commonality; digital content is merely streams of numbers. These streams are usually referred to as bit streams. Each stream represents arbitrary data to be interpreted by a computer; which in turn creates something meaningful.

Most digital content conforms to one of several standards. The most basic standard is perhaps the American Standard Code for Information Interchange (ASCII), a character encoding scheme that provides us with a common way to represent text on any computer. For example, a computer would use ASCII to interpret the binary code 110 0001 as the lower case letter “a”. Similar standards exist for representing music, video, images, and other forms of digital content.

In reality, no one can detect if content would infringe on an existing copyright without doing research with the U.S. Copyright Office. The Copyright Office currently uses an online database named the Electronic Copyright Office (eCO) which documents what has been copyrighted. Although they may not have the actual source code for the content, they at least have the name, description, and owner contact information.

Therefore, to determine if content has already been copyrighted a person must compare their content with the suspected owners content. This process takes a long time because human communication and analyses is required. The alternative I would propose, is to let computers handle easy comparisons, while maintaining the human component for the complicated requests. It’s important to keep the human component of this process to handle exceptions and complications that might not be detected by the Content ID system. The context of the content is a frequent example of copyright exceptions, since things like “educational use” or “parody” are not easily understood by computers.

Binary code and pure digital data was never meant to be interpreted by human beings. Computers on the other hand, are quite efficient at interpreting data when coded properly. Google released a system called Content ID in 2007 to give copyright owners the ability manage how their content is made available on YouTube. The Content ID system analyses the bit streams of uploaded video and music against a reference library to prevent copyright infringement. This library is generated from previously submitted content. If a match occurs, the owners policy preferences are then applied. Owners can choose to block, track, or monetize their content dynamically. For example, a record label might decide to block videos that use more than two minutes of their copyrighted song, but allow videos containing less than one minute.

Of course, since audio and video are very different they must be separated for legal and technological reasons. The system identifies these differences and applies the relevant restrictions. Matches can differentiate between only audio, only video, or both audio and video. This system gives the content owners the ability to account for fair use of their content themselves. In turn, this allows users to contest the fair use of content and seek approval from the copyright holder. The dispute is then automatically forwarded to the owners attention so they can decide whether or not it is a proper use of the copyrighted material.

The U.S. Copyright Office and Google should come together to offer a better system on a larger scale. It should be possible to incorporate some of the features of the Content ID system into the existing eCO system to provide a much better service for everyone. This would increase the owners ability to control copyrighted content by giving them more options for fair use as well as a standard system for disputing claims. It will also increase the efficiency of comparing and contrasting similar content by making the initial comparison computerized and available on the web.

Once this service is provided by the Copyright Office it will become a more centralized and trusted way for people to prevent copyright infringement. Instead of Google or any other organization having to maintain their own Content ID system, they could use the new service to provide a more accurate and dependable result. This will ultimately help the digital revolution become a more fair and controlled environment for copyright owners.

The system is not meant to enforce copyright laws, its purpose is to provide everyone with more information about copyrighted content. No doubt, this information can and will be used by people and lawyers to provide proof for enforcing copyright. However, it’s important to understand that its not the actual system enforcing copyright because there are too many exceptions and problems with such an automated system.

It is necessary for human beings to always be a part of settling disputes, updating the system, and revising copyright laws. The system should support creativity, protect fair use, and protect the rights of the owners while maintaining the basic principles of copyright. The owner should be able to control how the work is used and should be compensated for that work. This system would be key to protecting copyright in the digital revolution.