DeepScore: Digital Music Stand with Musical Understanding via Active Sheet Technology

Description

_Management Abstract
Playing and enjoying music is amongst the most rewarding recreational activities of humankind for individuals as well as in group settings. Visiting concerts or sending one’s kids to music lessons - thus being enabled to discuss and co-shape the musical part of our culture - are hence important aspects of cultural participation and social interaction. But with the public promotion of culture and art being on the drawback, new ways for reducing the cost of and easing the access to the variety of our music is needed. The CTI-funded DeepScore project - a consortium of Swiss startup ScorePad GmbH, data science leader ZHAW Datalab, and deep learning pioneer IDSIA - set out in mid-2016 to ensure this by bringing digitization to the process of playing music, thus securing the quality of life springing from feasible production and then consumption of musical performances. By stretching the limits of current deep learning methods in computer vision, we aim at lifting Optical Music Recognition to a level where it can be used by musicians to convert musical scores into digital active sheets, allowing for a number of new convenience functions like playback, transposition, automatic page turning, intra-orchestra communication etc. We experience high demand already from professional orchestras as well as music schools - all because of the remarkably simple but technically currently very challenging idea of putting a tablet computer on the music stand that not just displays pictures of scores. Why not putting a tablet on a music stand instead of a paper sheet?
We know a lot of musicians who raise exactly this question. Particularly to the younger generation – but not only! – it seems almost self-evident to use mobile technology not only to support and improve, but also ensure their cultural musical activities, and also enrich the communication within the community. So why not just take a tablet, and an APP? The short answer: Because nothing appropriate can be found, and in particular digital scores in sufficient quality are very rare. Below we explain how the DeepScore project creates technology to digitize music efficiently and enables a digital music stand for professional and leisure musicians alike, thus enhancing the joy of living through the enjoyment to play music.

The DeepScore project presented here is

innovative and original: for the first time, deep learning methodology is applied to convert traditional music scores into a digital format, promising adequate transcription quality; and
of general interest: there are huge libraries of works by renowned composers awaiting to be converted into a digital format. This will allow to further spread music by using modern technology, while reliably allowing to control the rights of the owners as appropriate.

_Introduction to DeepScore and its result, the ScorePad APP
For centuries now, paper has been the best material for music scores due to lack of alternatives. Nevertheless, in practice, certain properties are problematic. After some time of intense use, paper may become unusable; with a certain amount of scores, archiving and searching becomes cumbersome; turning pages calls for a free hand; air can blow paper away etc. Musicians have a clear need for a modern and easy to use solution, running on regular tablet computers. Besides replacing paper sheet music, ScorePad will go beyond the existing, and mainly static electronic products (mainly PDF based). There will be individual functions, such as taking notes, editing, transposition, fade-in and -out of elements, automatic page turning, playback, as well as interactive functions that shall support and enrich the play of ensembles and orchestras, and the exchange of ideas among musicians. There is currently no comparable product in the market, integrating the functions relevant for musicians. Our idea is designed by musicians for the specific needs of musicians. The DeepScore project brings together practitioners and active orchestras, and a start-up (ScorePad GmbH) with the IT research dept. of a university of applied sciences (ZHAW, AI group of Thilo Stadelmann & computer music expertise of Philipp Ackermann) as well as a fundamental research institute (IDSIA, group of deep learning pioneer Juergen Schmidhuber) to make sure to fit the developed solution to the needs. The project with the overall volume 935’000 CHF started in summer 2016 and will run for approximately 2 years before delivering a marketable result, funded by CTI.
_What will be the effect?
We foresee the following three specific effects of ScorePad on the cultural scene:

New technology enables digitization in the music industry. Our deep learning-based technology opens the way to absolutely new functions one would not have thought about when using paper. Some new functions seem obvious, e.g.

automatic page turning (most musicians have no free hand, and some even no free feet),
synchronization of entire orchestras (e.g. conductor can give precise indications to musicians in real time),
exchange of ideas in the community (e.g. to prepare a performance, or to share individual interpretations with peers),
transposition (e.g. when a teacher wants to adapt a violin piece for a cello pupil),
support of education (learning programs, track-and-feedback).

New needs will certainly appear while using our new system. Once the system will be developed, it will considerably simplify the daily work of practitioners and students, giving them easier access to their genuine activity – playing music. Digitized Scores. Precondition to the functioning of such a system is the availability of digitized scores in good quality. The established standard is musicXML [1]. Today, digital scores in good quality are scarce. So far the music publishers are very cautious with scores in a non-traditional format. Even official offerings in PDF are rare. The reason is simple: illegal copying. Paper is being photocopied, and PDF documents can even more easily be copied and sent around the world. In this context, several points need to be mentioned.

Currently, a musician in need of a score in digital format (musincXML) has three options: (1) It exists and can be downloaded in the internet, but this is quite rare; (2) use a scanning software, but the resulting quality is bad, and around 25% content must be corrected by hand; (3) use an advanced composers software (Sibelius or Finale), and type in the entire score by hand - but this can be quite cumbersome.
Scores in a logical digitized format allow the implementation of a proper digital rights management (DRM), and can thus safely be copy protected. This will open the door to cooperation with music editors.
DeepScore will contribute to an efficient conversion of printed scores into a digital format (musicXML). It is planned to offer the conversion feature either to users who acquired a score in a traditional format, or directly to music editors, to allow them to convert their own library.
Thanks to the result of the DeepScore project, a musician will be free to choose the specific music he or she really needs, and practice with one single application. Nowadays a musician needs a separate APP for almost each function, and is restricted to those score offerings within a given application (e.g. the Henle-APP allows to download and read only Henle scores).

Paper & Environment. It is well known that the production of paper is rather resource intensive (primarily wood, and energy). Traditional sheet music uses a lot of paper. According to one of the German big players in the music publishing industry, “[a]bout 130,000 titles (scores, books, recordings) […] are shipped from Mainz worldwide every year” [2]. There is no doubt that there is potential for a reduction in paper consumption by introducing this new technology. _What are the scientific issues, and where is the innovation?
Deep neural networks [3] have disrupted the world of computer vision, which became obvious to a wider audience at least since Alex Krizhevsky’s exploit on the ImageNet task in 2012 [4]. The corresponding gain in perceptual performance on a wide variety of vision tasks has often been such that error rates could be halved or even be improved by an order of magnitude, for example on the task of OCR (optical character recognition, the process from image to machine-readable text) [5]. Taking into account this and the previously described performance gap in the related OMR technology (optical music recognition), which has yet not profited from the deep learning surge, it is appealing to improve OMR by deep learning.
The main challenge is that application is not straightforward: While in classical OCR, the text is basically a 1D signal (symbols to be recognized are organized in lines of fixed height, in which they extend from left to right or vice versa), musical notation can additionally be stacked arbitrarily also on the vertical axis, thus becoming a 2D signal. This would exponentially increase the number of symbols to be recognized, if approached the usual way (which is intractable from a computational as well as from a classification point of view). The alternative route we are going is to enhance the usual convolutional neural network (CNN) approach with techniques like end-to-end learning of a combined detection-classifier in order to overcome this challenge. The OMR problem is not solved, though, when a deep neural network has learned to recognize single musical symbols: The individual symbols have to be assembled to a complete score, which is usually done using a lot of musical understanding, encoded into rules. To focus on the recognition part of the DeepScore innovation, we are using the open source OMR system Audiveris [6] for the overarching process of creating MusicXML. We are in active discussion with the Audiveris project leader and will contribute our findings to the project, thus giving back to the music community. Our very first prototype after one third of the project duration already outperforms the best-in-class open source system by a large margin on the set of symbols that is hardest to classify. A workshop for the computer music open source community is planned in summer to discuss these findings and further ideas to disseminate them widely within the community, creating even more impact. _Conclusion
ScorePad is a tablet solution for musicians, who can use the scores of their choice, work as they are used to, exchange with peers, and synchronize orchestras. If they wish to use the new technology, they will be able to do this by using one single application. This alone will contribute to controlling the costs of professional musical offerings like classic concertos, keeping it affordable for the wider public. ScorePad is also of great value to music teachers and students of all levels, making learning and thus cultural participation easier and cheaper. Thus, ScorePad will offer a new system to the musician’s community who are an important part of the living culture in our society. Additionally, the included exchange platform for users will support them in the communication amongst each other. By providing an efficient conversion procedure, the DeepScore project will lay the ground for the availability of digitized scores, which are precondition to such a system. DeepScore does so by applying and enhancing state of the art deep neural networks that have proven invaluable in other areas, but which’s application to the 2D structure of musical notation is not straightforward. There are huge traditional music libraries – some can be considered an heritage of humanity – awaiting to be conversed into a modern usable format [7]. All this is done by going considerably beyond the current state of the art in deep learning and optical music recognition, in a collaboration project between industry and 2 academic partners with a real product as the result. _References
[1] www.musicxml.com
[2] en.schott-music.com/about/
[3] Schmidhuber, “Deep learning in neural networks”, 2014
[4] Krizhevsky, Sutskever, Hinton, “Imagenet classification with deep convolutional neural networks”, 2012
[5] Lee, Osindero, “Recursive Recurrent Nets With Attention Modeling for OCR in the Wild”, 2016
[6] Audiveris open music scanner, audiveris.kenai.com
[7] see e.g. dme.mozarteum.at/DME/main/