New RNC Interface

The project aimed at transfering the Russian National Corpus website to a new interface began in 2022 and lasted about two years. The updated main page of https://ruscorpora.ru appeared in May 2022. Throughout the course of the project, the search interface for all corpora gradually changed, and various innovations and improvements were introduced to help our users solve their tasks faster and more efficiently.

Goals of the new interface

The modern user interface of the Corpus, in addition to constructing search queries and providing search results based on the formal description of the corpus, is designed to meet the growing needs of the Internet audience. Nowadays, the norm is ease of use on a variety of devices from which users access the Internet, as well as remembering preferred operating scenarios. The tasks that users solve routinely on the Corpus website should not require excessive effort even from an expert audience. And of course, such a valuable resource should become more accessible to a less prepared audience who is interested in the Russian language, but, unlike professional researchers, is not ready to independently draw academically relevant conclusions based on the large amount of information received. For all types of audiences, the relevance of the supporting information posted on the site — contextual clues, user manuals, announcements – is important.

Let's consider how the new interface at https://ruscorpora.ru addresses these challenges.

Interface that adapts to the user

According to SimilarWeb, as of November 2023, more than 60% of Internet users accessed it from mobile devices: smartphones or tablets. The share of mobile users is constantly growing. Users of the Russian National Corpus are no exception.

In accordance with these needs, the new version of the site supports a set of interface descriptions (style sheets) for various screen sizes, corresponding to the most common classes of mobile and desktop devices — from the smallest smartphone to a desktop computer with a good screen resolution.

The developed web application implements the “mobile first” approach, according to which the smallest mobile devices (having not only the smallest screen sizes, but also the smallest computing resources) are the first to discover the style sheet option they need and not perform a further resource-intensive search of options. The user automatically opens the version that is most suitable for the size of the device from which they access the Internet.

On the left is the desktop version of the site, while the version on the right is optimized for mobile devices.

For users whose primary language is not Russian, it is possible to switch the interface language to English.

All currently available user settings for search results are consolidated in a single Settings menu. User preferences are stored in the browser  and applied to future search queries. This includes the preferred type of search in each corpus (the corresponding search form will always be shown open), the default type of search results, and whether detailed information about the query is hidden or displayed in the corpus header.

60% of users access the Internet from mobile devices

Help for non-expert audiences

The Corpus was conceived 20 years ago as a tool made by linguists for linguists. However, Corpus materials have consistently piqued the interest of users from various professional fields, including editors, proofreaders, journalists, translators, teachers, and others. Additionally, the materials have garnered attention from less prepared individuals, such as school students. It is crucial not to alienate this diverse audience but to eliminate barriers and provide them with the tools to address their specific needs and challenges.

In 2022, the Get overview service was introduced in the RNC. This service aims to provide a broad audience with insights into the key capabilities of the RNC corpora. It introduces the general principles of interface design, illustrates the types of results that can be obtained, and highlights common errors when constructing search queries. The Tip of the Day section within the overview consistently features detailed information about the most interesting innovations.

The diversity within the field of linguistics is such that new functionalities introduced in the corpus, designed for specific linguists, may be less clear and obvious even to other linguists. However, this doesn't necessarily imply that the new functionality won't be of interest to them.

To facilitate a quick understanding of the new functionality for less trained users, the updated interface includes a regularly updated user manual, with a search function readily available for easy access.

It is common and valuable for professional audiences to independently construct complex search queries, analyze large amounts of data, and be able to further process the findings to draw their own scientific conclusions. The new audience often has a different objective—to quickly obtain a simple answer to their question. Thanks to the modularity and extensibility of the platform, we can build different interfaces for professionals and a wider audience, leveraging the same internal tools.

For example, the Word at a Glance service actually hides from users the need to construct several queries for search and other functionality, and then independently combine their results. Instead, the user only needs to enter the initial form of the word, and in a visually compact and understandable form, the service will present a variety of information for all available parses of a given lemma. The user will see sketches of the word (as a list of typical collocations), all parses of the word (instead of having to search to compare parses in different examples), and so on.

From the Word at Glance view there is a transition to the full search functionality and vice versa; from the search results, by clicking on the analysis of any word, you can go to its portrait. These cross-references, which allow the user to move from service to service without losing the context of their research, are another example of how we help users explore the capabilities of the corpus.

The Word at a Glance seamlessly connects to the full search functionality, and vice versa. Additionally, from the search results, users can click on the analysis of any word, leading them directly to its portrait. These cross-references enable users to transition from one service to another without losing the context of their research. This exemplifies how we assist users in exploring the diverse capabilities of the corpus.

Get Overview introduces the general principles of the Corpus interface, and Word at a Glance opens the possibilities of the RNC to a wide audience

Visualization as a way to present complex data

The visual presentation of statistical data enables a quick and effective communication of intricate information to the user.

With the assistance of pie and column charts, geographic maps, and graphs, the Corpus Portraits  interface of the RNC provides information about the structure and composition of corpora. In the Subcorpora Portraits section, users can utilize comparison charts to analyze the differences between their subcorpus and the corpus as a whole.

For the expert audience to draw deeper and more substantiated academic conclusions, it is crucial to comprehend and consider the intricacies of the calculation mechanisms. In the new interface, we not only outline the limitations of the calculation method in the user manual but also visualize additional information. For instance, in the Frequency output, confidence intervals for the calculated frequency are displayed. When presenting graphs, we explicitly indicate time limits beyond which there is insufficient data for reliable conclusions. Below the graph, warming stripes are shown, describing the number of texts in which results were found over different time periods.

Unlike conventional graphs and diagrams typically accepted in academic circles that complement search results, the Word at a Glance incorporates several more innovative visualizations. "Similar words" are presented as a tag cloud, where the size of the letters and the distance between words reflect the degree of proximity of the contexts in which the words are used. For morphemic analysis, a visual notation inspired by the traditions of teaching of the Russian language in schools was employed.

In several Corpus search scenarios, interface elements have been intentionally rearranged compared to the old version of the site.

A notable change is evident in the interface for specifying lemma and tags search conditions. Groups of conditions for the searched words in a phrase are now organized in a single line from left to right. This approach enhances visual intuitiveness for users, mirroring the natural arrangement of words in text where they are typically situated on the same line one after another.

The regrouping also affected the display of search results in parallel corpora. Now the original fragment is located on the left, and the translations on the right (you can switch between different translations). This allows you to fit more examples on one screen. For mobile devices, switching is implemented using a slider, which is more familiar to smartphone users.

The new interface incorporates a regularly updated and searchable User Guide.

Rapid resolution of routine tasks

Any user, regardless of qualifications, engages in a series of actions on the corpus website each time they access it. These actions should be efficient, requiring minimal effort (clicks), and should be as quick and straightforward as possible. The new interface offers direct access to a search for any of the corpora right from the main page, along with links to other frequently used functionalities.

When consistently working with corpus search, maintaining context is crucial, and having the user's current query always visible is important. A valuable innovation is the display in the corpus header of not only information about the parameters of a given query but also details about the parameters of the user subcorpus. This feature allows users to easily return to the form and adjust these parameters at any time, reducing effort compared to setting the parameters again.

To share research results, including for publication in scientific journals, users can now utilize short links to the query and Copy example button. This button allows information about the example and its output to be copied to the clipboard for convenient sharing.

The resolution of routine tasks should be as swift and straightforward as possible.

Website content management

The relevance of the supporting information on the Corpus website is upheld through a content management system. This system enables online structuring, tagging, and editing of announcements, articles, contextual clues, and user manuals for continuous maintenance and updates.

Updated on 27.12.2023