Curating bepress metadata for harvesting by SHARE

Hello world!

Here is the gist of a talk that I gave with kind assistance and input from Wendy Robertson at the recent DC-GLUG 2017 –hosted by the Michael Schwartz Library at Cleveland State University –about a project that four bepress IR coordinators,  who were all also part of the first cohort of SHARE digital curation associates, worked on together.

BePress Metadata and SHARE

What is SHARE?

BePress Metadata and SHARE (1).png

SHARE is an initialism of : Shared access research ecosystem.

“SHARE is a higher education initiative whose mission is to maximize research impact by making research widely accessible, discoverable, and reusable. To fulfill this mission SHARE is developing services to gather and freely share information about research and scholarly activities across their life cycle. Making research and scholarship freely and openly available encourages innovation and increases the diversity of innovators.

Where open metadata about research already exists, its usefulness is limited by poor or inconsistent quality or by difficulty of access. For most individuals or groups to use this data, the cost of accessing, collecting, and improving the data is too great.”

A partnership of the Association of Research Libraries, The Association of Research Libraries (ARL), The Association of American Universities (AAU), and The Association of Public and Land-grant Universities (APLU), these organizations initiated and founded SHARE. In collaboration with the Center for Open Science, SHARE is building a critical infrastructure to enable research outputs to be discoverable and reusable and so that these digital assets of research will be traceable throughout their life cycle.

The SHARE 2.0 release has “enhanced  search capabilities, including filtering by preprint or publication, by subject, by funder, and by institution. SHARE 2.0 gives you the information you need to find new, relevant research and to find potential collaborators.”

As of  July 2017, there are 156 providers to the SHARE database.  In addition to harvesting from a large number of  institutional repositories,  other open data sources include Crossref, Biomed Central, PLOS, Dryad data repository, among others. 

BePress Metadata and SHARE (8).png

Having your metadata harvested by SHARE is easy!

It is easy because the SHARE pipeline is format agnostic and normalizes the data for you. For an early example and explanation, see Rick Johnstons’ blog post  SHARE Metadata Is Stitching Together the Research Life Cycle.    All you need to do is register via an online form and  provide the SHARE folks with the base URL of your repository, followed by do/oai. For those who prefer, one can also push directly into the SHARE database by using the API.

share_reg

SHARE digital curation group

As part of the 2016-2017 SHARE Curation Associates program, we were asked to review our own repository metadata and verify which fields are pushed to the OAI-MPH endpoint and see how they were captured by the SHARE harvester. Several of us were using the bepress platform and thought it a worthy effort to collaborate on our research and our findings. 

Lisa Palmer,  Emily Stenberg,  Wendy Robertson and I talked and emailed over several months and prepared a gap analysis of the metadata provided by our institutions and harvested by SHARE.   We had three goals in mind:  to improve institutional metadata curation processes;  to provide good and consistent metadata to SHARE; and to develop  recommendations that other Digital Commons institutions could apply, whether they were SHARE contributors or not. The intent was to help create a shared understanding of how our data was harvested. We began by looking at Digital Commons default Dublin Core mapping for various kinds of bepress collection structures and mapping it to the then current SHARE schema (it is still in beta and continues to evolve).

block_10.jpeg

Our initial findings were presented as a poster by Lisa Palmer at ACRL 2017,  Mind the gap: Curating Digital Commons Metadata for SHARE.

Harvesting bepress metadata

Digital Commons exposes metadata fields for harvesting through four different metadata formats, or prefixes, as shown below:

BePress Metadata and SHARE (9).png

To view your own data, and return a set of  associated metadata records exposed through OAI, you can adapt the following sample which  will display your IR’s first 100 records in a repository-level request:

http: // [Site URL]/do/oai/?verb=ListRecords&metadataPrefix=[Enter oai_dc]

For more instruction of bepress harvesting see the Digital Commons and OAI-PMH guide at https://www.bepress.com/reference_guide_dc/digital-commons-oai-harvesting/

block_11.jpeg

Highlights of our Project

As we delved into the data, we noticed some issues and sometimes struggled to understand clearly what various pieces of data meant, when taken out of context of the IR.  At times it was simply a default that didn’t fit for all collections up to more complex issues, such as identifying  which of the possible dates the item carried, or  how to retain institutional affiliation and disambiguate names. Since pre-print, post print and version of record may be of import to our users, how might we best indicate that in our metadata? Distinguishing between different sorts of dates can be very challenging to apply consistently. And while the OpenURL links and takes us to the final published product, how can we better indicate to what dc:format and dc:source refer, since examples such as <dc:format.extent>297</dc:format.extent> might be better described.

The following slides are some examples of issues that we discovered our first dive into the data:

Some things we discovered en route

BePress Metadata and SHARE (2).png

BePress Metadata and SHARE (3).png

BePress Metadata and SHARE (6).png

BePress Metadata and SHARE (5).png

BePress Metadata and SHARE (4).png

BePress Metadata and SHARE (7).png

Since the SHARE aggregator is still undergoing development, our metadata target has not necessarily been a stable one, and over the past year the schema continued to be reformed and improved. As we have mae suggestions and as SHARE is evolving, changes have been made and will need to continue to be monitored; but for now these are some of our general recommendations.

Our Recommendations:

  • Consult bepress documentation on metadata options and OAI-PMH
  • Review how records for different collections are exposed in the various bepress OAI-PMH formats. Are custom fields mapping as expected/desired?
  • Create standard metadata using consistent internal field names. Develop ideal format for each collection type on your demo site and use it as a reference going forward
  • Add a “data dictionary” to your repository at the collection level.  Work with bepress consultant to modify and migrate existing collections using this documentation.
  • Share your practices publicly
  • Link to your data dictionary from your repository to an external site such as Google Sheets or GitHub, and Share with Digital Commons user group or Resource Library

SHARE recommendations

  • Every OAI source supports oai_dc, but they usually also support at least one other format that has richer, more structured data, like oai_datacite or mods.
  • Choose the format that seems to have the most useful data for SHARE, especially if a transformer for that format already exists.
  • Choose oai_dc only as a last resort.

Your feedback is welcome!

And our final piece, we have also created best practices for a number of specific metadata fields and look to bepress and our IR colleagues to cast a critical eye over them.

We have looked carefully at Digital Commons standard mapping to Dublin Core, broader Dublin Core practices, and DataCite guidelines (which are preferred by SHARE). We are hoping the broader Digital Commons community can give feedback on our recommendations. We are hoping that if many of us can agree on the same practices, it will be easier for bepress and SHARE to implement.

Our best practices document is here and we would welcome comments on the document and discussion on the list, or you can contact any of us directly with comments: Best Practices for Mapping Digital Commons Metadata for Harvesting by SHARE

BePress Metadata and SHARE (11).png

 BePress Metadata and SHARE (10).png

BePress Metadata and SHARE (12).png

Advertisements

playing with Tableau

Sheet 3

                                   var divElement = document.getElementById(‘viz1494514556579’);                    var vizElement = divElement.getElementsByTagName(‘object’)[0];                    vizElement.style.width=’100%’;vizElement.style.height=(divElement.offsetWidth*0.75)+’px’;                    var scriptElement = document.createElement(‘script’);                    scriptElement.src = ‘https://public.tableau.com/javascripts/api/viz_v1.js&#8217;;                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                

OA in Action: the ups and downs of the Institutional Repository

Also Published on the Open Book Publishers Blog

Institutional Repositories (IR) have become a staple of most academic libraries. They are generally populated by Electronic Theses and Dissertations (ETDs) and faculty peer-reviewed manuscripts, also known as post-prints in the parlance of green open access. The IR managed by Western Libraries at The University of Western Ontario, Scholarship@Western, aspires to host the complete intellectual output of the institution.  It is the home to faculty articles, working papers, over 25 open access journals, as well as several conference proceedings, and some digitized materials  such as photographs,  music scores, and historical university course calendars.

The inclusion of  electronic theses and dissertations are a considerable boon to the IR, as they recently achieved just over 2 million downloads worldwide and are one of the best drivers of web-traffic to the repository.

western_university_electronic_thesis_and_dissertation_repository___western_university

The benefits for students are many, but the most compelling benefit is a dramatic increase (50-250 per cent) in citation impact. The greater visibility for their work and therefore higher citations counts, means greater impact.  This increased citation rate doesn’t just apply to ETDs but to all kinds of open access publications.

(SEE: SPARC Bibliography http://sparceurope.org/oaca_table/ )

There is often difficulty in getting faculty to participate, for a variety of good reasons, such as, the administrative burden of uploading their work into yet another place, benefits of the IR were unclear and  they may be unsure which version can be hosted by the IR, and ineffective marketing of the significant benefits gained by the author when using the repository.

It is true that repositories have not *yet*completely fulfilled their potential.

The sciences have typically been more broadminded about OA, and the nature of scientific publishing of articles lends itself to this.

Kathleen Shearer, COAR 

The Humanities, with its tradition of research disseminated in book format–books are often years in the making–seems less inclined to participate in open access, however, the times, they are a changing.

Recently Scholarship@Western, has ventured beyond ETDs and green OA, with the addition of three new e-Books.

51f5pti5z3l-_sx331_bo1204203200_Our first book, Ukraine’s Euromaidan: Broadcasting through Information Wars with Hromadske Radio by Marta Dyczok, is a study of unfolding current events, so the timely release of information was important. Dr. Dyczok looks at Public Radio Ukraine’s decision “to provide accurate and objective information to audiences – free of state and corporate censorship and any kind of manipulation. . .This book brings together a series of English language reports on the Ukraine crisis first broadcast on Hromadske Radio between 3 February 2014 and 7 August 2015.” 

The upload of the book  coincided with Dyczok’s most recent and related project, a photo exhibit titled Faces of Displacement in Ukraine, the preliminary results of  interviews with 70 internally displaced people of Crimea and the Ukraine.  

I would love to publish some select photos in an image gallery in the IR  that would reside alongside this book. My belief is that such a display would add impact to the text and also draw in more readers, helping to shed light on the dramatic facts and the discussions inside Ukraine during the crisis.

The second publication is a local history book,  How Middlesex County was Settled with Farmers, Artisans, and Capitalists: An Account of the Canada Land Company in Promoting Emigration from the British Isles in the 1830s through the 1850s by Marvin L. Simner. This work is of great interest to local historians and genealogists. Print copies, available at local libraries, are hard to come by and because of the scarcity of this kind of research, libraries typically are reluctant to lend them out to other institutions via inter-library loan. Hosting this book in the repository means that family historians and other interested parties have unfettered access to the research, readily discoverable by internet search engines,  at anytime.

Our the third work is a social cultural history never told before, Esprit de Corps: A History of North American Bodybuilding by James Woycke, the first comprehensive history of bodybuilding in North America.  This book reveals how Montreal brothers Ben and Joe Weider’s leadership was central to founding the world’s premier bodybuilding organization, the International Federation of Bodybuilders (1947). 

This intriguing history is not the only story this book has to tell. Professor Woycke’s draft of Esprit de Corps was completed at the onset of a debilitating illness. Following his death, friend and colleague Rod Millard discovered the manuscript in his papers and enlisting the aid of sport historian Dr. Craig Greenham prepared the draft for publication.  Unable to find a traditional publisher to take on this history, Professor Millard found a perfect partner in the university’s institutional repository.  

Rod Millard remarked:

I have come to realize that it is going to have more readers through Scholarship@Western than it would have ever had in print.

An excellent account of the Esprit de Corps story appears in Western NewsBook a tribute to heft as scholar, weight of friendship courtesy of Jason Winders, Director, Editorial Services.

Benchmarking the Institutional Repository, Scholarship@Western 2015-16

with thanks to Vince Gray, Data Librarian, for his kind assistance.

Growth: Calculates the number of digital objects added in this time period!

Our Theses and Dissertation Series continues to be our strength! Thanks SGPS 🙂 Two other series that have grown exponentially are the Department of Economics Working Paper Archive  and the Aboriginal Policy Research Consortium International

Breadth: Calculates the number of Subject areas that have at least one new object added.

Demand: Calculates that average number of downloads per digital object. this year we have an average of 70 downloads per item.

Western Faculty and Graduate Students if you wish to showcase your research in the Open Access Institutional Repository, Scholarship@Western, you may create a free account and self-archive your work. If you need any assistance feel free to contact your friendly neighbourhood  librarian or email wlscholcomm@uwo.ca

Scholarship@Western: Keeps on Keepin’ on!

Monthly Readership Totals: 
Last month, Scholarship@Western had 77448 full-text downloads and 239 new submissions were posted, bringing the total works in the repository to 16326. 
The most popular papers were: 
Evaluating the Montreal Cognitive Assessment (MoCA) and the Mini Mental State Exam (MMSE) for Cognitive Impairment Post Stroke: A Validation Study against the Cognistat (2024 downloads)
http://ir.lib.uwo.ca/etd/852Topic Shift in Casual Conversation (714 downloads)
http://ir.lib.uwo.ca/totem/vol2/iss1/8
The Effects of Music on Memory for a Word List (603 downloads)
http://ir.lib.uwo.ca/hucjlm/vol50/iss1/4Screenshot_062816_030034_PM