Standardizing Survey Data Collection to Enhance Reproducibility: Development and Comparative Evaluation of the ReproSchema Ecosystem

IntroductionBackground

Ensuring reproducibility in biomedical, clinical, behavioral, and social science research is crucial yet challenging, affecting every stage of the research process, from study design to results reporting [-]. Core issues include inconsistencies in research protocol implementation within studies over time (eg, across generations of researchers in a laboratory) and across study sites (eg, multisite projects), variable data collection methods, unclear documentation of methodological choices, selective reporting practices, and limited transparency in data and code sharing [,]. While methodological diversity is essential for addressing diverse research questions, reproducibility relies on transparent reporting of methodological decisions to minimize researcher degrees of freedom. Efforts to improve reproducibility often focus on data analysis and dissemination, yet inconsistencies in questionnaire-based data collection are frequently overlooked. Such inconsistencies, whether in structured psychological assessments or general-purpose surveys, undermine internal reproducibility in multisite and longitudinal studies, reducing data comparability and introducing systematic biases.

Inconsistencies in survey-based data collection arise from multiple factors, including variability in translations across languages [,], differences in how constructs are operationalized [], selective inclusion of questionnaire components [,], variations in clinical diagnosis criteria [,], and inconsistencies in versioning across research teams and time points. Even minor modifications, such as alterations in branch logic, response scales, or scoring calculations, can significantly impact data integrity, particularly in longitudinal studies [].

These discrepancies have profound consequences in both clinical and research contexts. In clinical settings, even slight deviations in assessment methods can lead to divergent patient outcomes, particularly in multicultural environments where assessment translations need proper cultural adaptation. In research, such inconsistencies undermine study integrity and bias conclusions, posing challenges for meta-analyses and large-scale collaborative studies that require harmonized datasets.

Several initiatives have sought to improve research reproducibility. The findability, accessibility, interoperability, and reusability (FAIR) principles [] provide high-level guidance for data management and sharing, ensuring that research data are well documented, discoverable, and reusable. While FAIR does not directly address study reproducibility, its principles support transparency and consistency in data handling, which are critical for reproducibility efforts. However, these principles primarily focus on postcollection data curation and accessibility, leaving gaps in standardizing survey-based data collection at the source. Similarly, resources such as the Cognitive Atlas [] and Cognitive Paradigm Ontology [] have helped standardize terminologies for cognitive research but do not define data elements such as survey questions and allowable responses. The National Institute for Mental Health Data Archive [] and the National Library of Medicine Common Data Elements [] initiative promote standardized data elements, yet their implementation remains inconsistent across studies.

Widely used survey platforms such as Qualtrics (Qualtrics International Inc) and REDCap (Research Electronic Data Capture; Vanderbilt University) [] provide structured tools for data collection but do not inherently enforce assessment standardization, version control, or interoperability across research teams and time points. While these platforms allow researchers to create and distribute surveys, they do not provide mechanisms to systematically track changes or ensure that identical constructs are measured consistently over time. To address these challenges, a schema-driven approach is needed to define and enforce standardized survey structures, ensuring consistency in question formats, response options, and metadata across studies. The Center for Expanded Data Annotation and Retrieval (CEDAR) Metadata Model [] provides a structured system for biomedical data annotation, but its primary focus is on postdata collection metadata management rather than ensuring consistency during data collection.

Despite these efforts, ensuring consistency in survey-based data collection remains insufficiently addressed, particularly in longitudinal studies and multiteam research projects, where maintaining assessment comparability over time and across sites is critical. Without a structured framework for defining, managing, and versioning questionnaires at the point of data collection, researchers often face time-intensive and error-prone processes of harmonizing disparate datasets during later stages of analysis. A systematic solution for ensuring consistent data collection within a study—across time and research teams—while allowing flexibility for study-specific requirements is necessary to improve research integrity and facilitate large-scale data sharing and reuse.

Objectives

ReproSchema is a schema-driven ecosystem () that integrates a foundational schema with supporting tools to standardize survey-based data collection and facilitate reproducibility. At its core, ReproSchema has a foundational schema that structures and defines assessments, including standardized psychological scales, clinical questionnaires, and general-purpose surveys, by linking each data element (eg, survey response and experimental measurements) with its metadata, such as collection method, timing, and conditions. This structured approach ensures consistency across studies, supports version control, and enhances data comparability and integration.

‎

Figure 1. ReproSchema workflow overview. This figure illustrates the 6 core components of the ReproSchema ecosystem. (A) Input sources include Research Electronic Data Capture (REDCap)–formatted CSVs (to be converted using redcap2reproschema), large language model–parsed questionnaire files (eg, PDFs), and a reusable JSON for linked data (JSON-LD) assessment library. (B) Protocols are curated using the reproschema-protocol-cookiecutter tool, which structures metadata, enforces schema validation, and facilitates deployment. (C) GitHub repository versions of each protocol and its components, assigning persistent uniform resource identifiers (URIs) and serving associated web interfaces. (D) The browser-based reproschema-ui enables interactive survey deployment and data collection. (E) Survey responses are stored in structured JSON-LD with embedded provenance and links to standardized schema elements. (F) Output data can be converted to various target formats, including National Institute of Mental Health National Data Archive (NDA) Common Data Elements (reproschema2cde), BIDS phenotype format (reproschema2bids), and REDCap CSV (output2redcap), facilitating downstream harmonization and reuse. UI: user interface.

Beyond the foundational schema, ReproSchema comprises six essential components:

A library (reproschema-library) of standardized, reusable assessments, each formatted in JSON-LD [], providing a structured and versioned resource for common research instrumentsA Python package (reproschema-py) that supports schema creation, validation, and conversion to formats compatible with existing data collection platforms such as REDCap and the Fast Healthcare Interoperability Resources (FHIR) standardA user interface (UI; reproschema-ui) designed for interactive survey deployment, with ongoing development to enhance integration with customized back endsA back-end server (reproschema-backend) for secure survey data submission, using token-based authorization for client registration and data transfer, with support for structured data storage and managementA protocol template (reproschema-protocol-cookiecutter) that enables researchers to create and customize research protocols using the standardized assessments and UIA Docker container (reproschema-server) that integrates the UI (reproschema-ui) and back end (reproschema-backend) to provide a unified platform for deploying protocols and collecting survey data using widely available cloud container services

These components operate both as an integrated system and as stand-alone tools, allowing flexibility in research implementation. Unlike conventional survey platforms that primarily provide graphical user interface–based survey creation, ReproSchema prioritizes schema-based standardization, metadata integration, and interoperability, ensuring that survey elements remain consistent across studies and over time.

presents the ReproSchema workflow, which standardizes survey data collection to enhance research reproducibility and interoperability across studies. The workflow consists of 6 key components. First, ReproSchema supports multiple input formats, including questionnaires in PDF or DOC format (which can be converted to ReproSchema format using large language models, such as Claude 3.7 Sonnet by Anthropic, as demonstrated in ), existing assessments from the ReproSchema library, and REDCap CSV exports (which can be automatically converted using redcap2reproschema). Second, the reproschema-protocol-cookiecutter tool provides a structured, stepwise process for researchers to create and publish a protocol on GitHub, ensuring organized metadata and version control. This tool enables schema validation and UI serving. Third, ReproSchema protocols are stored in GitHub repositories (or other Git-compatible services), where version-controlled uniform resource identifiers (URIs) ensure persistent access to protocols, activities, and assessment items, supporting reproducibility and provenance tracking. Fourth, the reproschema-ui provides a browser-based interface for interactive survey deployment, allowing researchers and participants to collect structured data while maintaining schema integrity. Fifth, survey responses are stored in JSON-LD format, with embedded URIs linking each protocol, activity, and item to their respective sources in the ReproSchema library. This structure ensures data provenance, traceability, and semantic interoperability. Sixth, the reproschema-py tools facilitate output conversion into various standardized formats, including the National Institute of Mental Health (NIMH) Common Data Elements (reproschema2cde), the Brain Imaging Data Structure phenotype format (reproschema2bids), and REDCap CSV format (output2redcap), ensuring compatibility with existing research workflows.

This paper introduces ReproSchema as a comprehensive framework for addressing inconsistencies in survey-based data collection, thereby enhancing research reproducibility. The Methods section details the conceptual foundation of ReproSchema, describes its components, and presents a comparative analysis against 12 survey platforms. In addition, three research use cases illustrate its applicability: (1) NIMH-Minimal (standardizing required mental health survey common data elements), (2) Adolescent Brain Cognitive Development (ABCD) and HEALthy Brain and Child Development (HBCD) Studies (tracking and managing changes in longitudinal assessments), and (3) Committee on Best Practices in Data Analysis and Sharing (eCOBIDAS; developing an interactive checklist for neuroimaging research best practices). The Results section examines the outcomes of these use cases and highlights how ReproSchema aligns with the FAIR principles. Finally, the Discussion section analyzes the comparative findings, summarizes ReproSchema’s contributions, and outlines current limitations and future directions.

MethodsReproSchema’s Foundation and ImplementationConceptual FrameworkOverview

A schema, originating in information systems and database design, is a structured framework for organizing data []. The concept of schema has been applied to various fields, including web technologies [,], where it defines the structure and organization of web-based data. In the context of World Wide Web Consortium protocols [], schemas ensure web documents adhere to specific formatting and interoperability guidelines.

Beyond web technology, schemas are essential for tracing data provenance, providing transparency and accountability by tracking data’s origins, modification, and lineage [-]. In survey-based research, a structured schema maintains consistency and reliability in data collection, especially when assessments vary in complexity and format. A well-defined schema establishes explicit documentation for assessment structure, including question types, response formats, branch logic, and scoring interpretations, thereby ensuring transparency in methodological choices. This approach reduces errors and inconsistencies that arise when researchers implement the same assessment differently, such as variations in scoring methods or selective omissions of items, which can compromise data comparability and reproducibility.

ReproSchema is inspired by the schema-based principles of the CEDAR Metadata Model but has been developed independently to standardize survey-based data collection in biomedical, clinical, behavioral, and social science research. It addresses key challenges such as version control, metadata consistency, and interoperability with existing data collection platforms. This adaptation has resulted in three primary advances.

Integration With Established Standards

ReproSchema aligns with schema.org [] and the Neuroimaging Data Model [] to enhance data harmonization across studies. For example, schema.org provides a standardized way to describe survey elements, ensuring that metadata (eg, question labels and response formats) remain consistent across platforms. In neuroimaging, Neuroimaging Data Model integration allows seamless linking of behavioral assessments (eg, cognitive tests) with magnetic resonance imaging or functional magnetic resonance imaging data to statistical output [], enabling researchers to track relationships between questionnaire responses and brain activity. This alignment ensures that datasets remain interoperable, reusable, and compatible with large-scale collaborative studies and meta-analyses.

Incorporation of Linked Data Modeling Language

ReproSchema adopts Linked Data Modeling Language (LinkML), an open standard for defining and validating structured data models. LinkML enhances schema expressiveness and validation by enabling explicit data types, relationships, and constraints within survey structures. For example, it allows researchers to specify that a response should be an integer within a defined range (eg, an age field must be between 18 and 99 years) or enforce controlled vocabulary terms (eg, depression severity levels as “mild,” “moderate,” or “severe”). If a questionnaire version changes, such as modifying a Likert scale from a 5-point to a 7-point format, LinkML enables automated detection of such modifications, ensuring that data collected across different versions remains interpretable. This structured validation prevents data entry errors, version mismatches, and inconsistencies that could affect longitudinal analyses or multisite studies.

Adaptation to Complex Research Structures

ReproSchema uses a nested structure (protocol>activity>item) to represent multilevel research designs (). A protocol defines the overall study framework, such as a longitudinal mental health survey or a multisite clinical trial, comprising multiple activities, such as diverse assessments. Each activity can represent a specific assessment, such as the Patient Health Questionnaire-9 for depression screening or a cognitive memory test. An item corresponds to an individual question or measurement, such as “Over the past two weeks, how often have you felt down, depressed, or hopeless?” with a Likert-scale response format. This hierarchical structure allows researchers to reuse standardized assessments, maintain version control, and ensure consistent data formatting (eg, REDCap and FHIR), as explained in the subsequent practical implementation and use cases.

‎

Figure 2. Mapping a research protocol to ReproSchema. This figure illustrates how an assessment is structured and represented in ReproSchema, as well as how it appears in the user interface (UI). The entire assessment (outlined in red) is an activity. Each individual question (green) is an item, and the available answer choices (purple) are ResponseOptions. When a participant selects an answer (orange), that selection is recorded as a response. One protocol can have multiple activities. The right panel demonstrates how different activities within a protocol are organized in the UI, allowing users to navigate between different activities. Practical Implementation

The implementation of ReproSchema introduces several innovative features aimed at enhancing research efficiency and data integrity.

Persistent URI Management via Git

ReproSchema uses Git [] for robust version control and GitHub, a cloud-based hosting service for Git repositories, for persistent URIs [] assignment. GitHub serves as a web server to manage URIs for individual schema components, such as protocols, activities, and items. This setup ensures that any schema version remains retrievable through unique Git commits, which function as immutable time stamps for specific code versions. Git is a separate version control system that can be used independently of GitHub, a cloud-based hosting service for Git repositories. Hence, any other Git-based service that provides persistent URLs can also work for reproschema. Surveys or questionnaires generated through ReproSchema inherit this version control functionality, enabling researchers to track changes to survey elements (eg, questions and response options) and facilitate study replication or long-term review.

Support for Various Question Types

The ReproSchema schema defines a diverse range of question types, including Likert scales, drop-down menus, multiple-choice questions, numerical inputs, audio or video inputs, free-text responses, and more. It also enables conditional visibility (ie, branching logic), where specific items or entire activities become visible based on participant responses.

Inclusion of Computable Elements

Beyond defining survey structures, ReproSchema incorporates computable elements to streamline data analysis and enhance automation. The schema allows for predefined calculations on numerical responses, such as computing sums, means, or SDs. For instance, in the DSM-5 Adult Self-Report Cross-Cutting Symptom Measure, ReproSchema can automate the computation of an anxiety subscore by averaging responses from the first 5 questions (eg, “Feeling nervous, anxious, or on edge” and “Not being able to stop or control worrying”).

This automated computation is particularly valuable in large-scale or longitudinal studies, where manual scoring increases the risk of errors and inefficiencies. Moreover, computed values can inform subsequent data collection steps, enabling adaptive survey designs. For example, if a participant’s computed anxiety score exceeds a predefined threshold, additional follow-up assessments could be triggered dynamically. By embedding computational logic within the schema, ReproSchema enables efficient, standardized, and adaptive data collection workflows.

Integration of UI Elements

ReproSchema provides a graphical user interface (reproschema-ui) for survey execution, enabling participants to complete assessments interactively while ensuring structured data entry. The UI supports a range of input types, including numerical fields, text boxes, multiple-choice selections, sliders, and date pickers. In addition, it accommodates more advanced data capture methods, such as (1) file uploads for submitting scanned documents or images (eg, medical records and task responses), (2) audio checks and recordings for voice-based assessments (eg, speech fluency tasks in cognitive research), and (3) digital consent forms to ensure compliance with ethical requirements in online studies.

The UI is designed to be configurable through JavaScript and is hosted as an open-source GitHub repository, allowing researchers to modify and deploy it according to their study needs. While ReproSchema does not provide built-in survey hosting, researchers can deploy the UI alongside the back end using the Docker-based reproschema-server, which integrates both components into a unified deployment. This setup ensures flexibility in different research environments. For example, a researcher studying cognitive decline may deploy a reproschema-server instance to collect audio recordings from participants and store them securely for further analysis.

Interoperability

ReproSchema’s nested-layer design ensures compatibility with existing data collection and management platforms, enabling seamless conversion and integration with other research workflows. The schema supports bidirectional format conversion, allowing researchers to (1) convert ReproSchema-based assessments to REDCap surveys, ensuring compatibility with widely used clinical research infrastructure, and (2) translate surveys into the FHIR standard [], facilitating integration with electronic health record systems.

In addition, the modular schema design enables mapping between ReproSchema and other standardized formats. For instance, a research team conducting a multisite study across institutions using different survey tools may use ReproSchema to maintain a centralized, schema-defined questionnaire while allowing site-specific exports to local data collection platforms. We also provide tools to map the data collected through ReproSchema to the REDCap output format and the NIMH submission template.

Comparison AnalysisOverview

This subsection compares ReproSchema’s capabilities to those of other platforms, focusing on the FAIR principles and generic functions found across survey platforms. Through the following selection process and 2 sets of comparison criteria, we highlight the importance of standardized protocols and assessments in improving research outcomes. This comparison evaluates ReproSchema’s role in enhancing the consistency and reliability of research data collection.

Identification and Selection of Platforms

On May 20, 2024, we conducted a web search to identify survey data collection platforms commonly used in biomedical, clinical, behavioral, and social science research. We used the keywords “online survey data collection research tool” and “online behavioral experiment data collection research tool” in 2 separate searches, including “behavioral experiment” to capture tools using questionnaires alongside experiments. We then complemented the search list by consulting relevant experts on tools commonly used in biomedicine, clinical trials, and mental health data collection.

Search results were categorized into platform websites and recommendation articles. Platform websites refer to the official sites of the platforms, while recommendation articles include sources listing recommended tools, such as “best data collection tools for research.” Results from these websites and articles were collected to form a comprehensive list. The search results were iterated until no new tools appeared in 10 consecutive results. Results from the 2 searches were then combined to ensure comprehensive coverage. Only tools with at least a web version available were included.

A total of 53 tools were identified: 4 (8%) from the expert panel, 36 (68%) from the “online survey data collection research tool” search, and 13 (25%) from the “online behavioral experiment data collection research tool” search. We filtered those tools based on 2 criteria: research and academia orientation and primary functionality type. Tools were labeled “yes” for research and academia orientation if their websites explicitly indicated use for research purposes or if research institutions were mentioned as primary users. The primary functionality types were categorized as questionnaire focused, experiment and questionnaire, and experiment focused. For example, if a platform only supports using questionnaires in conjunction with an experiment, we label this platform as “experiment focused.” Of the 53 tools, 24 (45%) were identified as research and academia focused. From these, we filtered out 46% (11/24) of the tools that were experiment focused, leaving 54% (13/24) of the tools.

We then verified the remaining platforms using Google Scholar, including those with their own publication or that are widely used in publications (ie, at least 10 publications). One platform was excluded due to insufficient publication support. In total, 12 platforms were included in our final comparison. The selection process is illustrated in as a PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) chart, with a full list of platforms and selection criteria provided in [,-].

‎

Figure 3. PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flowchart. This figure illustrates the selection process for survey and experiment data collection platforms, narrowing from 53 identified tools to 12 final selections based on research orientation, functionality, and publication support. Comparison Criteria

We developed 2 sets of comparison criteria to evaluate ReproSchema and the 12 selected tools. The first set focuses on overarching principles for data and metadata management and reusability, while the second set examines practical aspects of tool functionality.

In the first set, we first asked whether the platform had a schema and then evaluated adherence to the FAIR principles (). While originally developed for data stewardship, FAIR principles also contribute to reproducibility by ensuring that research data, including protocols, are well documented and structured to facilitate reuse across studies. The FAIR principles emphasize that data and metadata should be well documented, easily discoverable, accessible under clear conditions, interoperable through shared standards, and reusable through detailed provenance and licensing information.

In the context of survey and questionnaire design and curation before data collection, we interpret “data” as elements within a questionnaire (eg, single questions and response options) and “metadata” as information associated with the questionnaire and its elements during design. In , we provided the original FAIR principles and our adapted version side by side to ensure clarity and operationalize these principles for our comparison. Because the FAIR principles serve as general guidelines for scientific data management and our comparison focuses specifically on the survey and questionnaire design phase, some criteria may not be directly applicable in our context. Nonetheless, we listed all criteria for completeness but excluded 1 less relevant standard (R4) from the comparison, as the focus of our work does not involve the relevant standards in the field.

Textbox 1. Adapted findability, accessibility, interoperability, and reusability (FAIR) principles used as the comparison criteria.

Findability

F1: Does the platform assign unique and persistent identifiers to questionnaires and their elements (eg, questions and response options)?F2: Does the platform support metadata to describe questionnaires, including description, version, date created, etc?F3: Is it easy to associate metadata with the underlying questionnaire?F4: Are questionnaires and their elements indexed in a searchable database or repository?

Accessibility

A1: Can questionnaires and their elements be accessed using standardized protocols (eg, HTTP and Representational State Transfer Application Programming Interface)?A2: Are the protocols used for accessing questionnaires open, free, and widely supported?A3: Does the platform support secure access mechanisms (eg, OAuth [Open Authorization] and application programming interface keys)?A4: Is the metadata describing the questionnaires accessible even if the actual questionnaire’s content is removed or archived?

Interoperability

I1: Are standardized formats used for representing surveys and their elements (eg, JSON and XML)?I2: Does the platform use standardized sets of terms and definitions to describe questionnaires and their parts (such as questions and answer choices)?I3: Are references to related questionnaires, question items, or external resources included?

Reusability

R1: Detailed metadata: Is detailed metadata provided for questionnaires and their elements, covering all relevant aspects (eg, methodology and intended use)?R2: Are clear and accessible use licenses provided for surveys and their elements?R3: Is detailed provenance information available for surveys and their elements, including creation and modification history?R4: Do surveys and their elements comply with relevant standards in the field? (excluded)

Our second set of criteria () focuses on general survey platform functions, emphasizing survey design and curation rather than post–data collection procedures. To support consistent evaluation, we anchored the “shared assessments” criterion in widely used mental health instruments (eg, Patient Health Questionnaire-9 and Generalized Anxiety Disorder-7). These assessments offer publicly available, well-defined formats that enable us to determine whether platforms accommodate the reuse of validated instruments. Although this choice draws from mental health research, the intent is not to restrict the scope of evaluation to a single domain. Instead, it reflects a practical approach to assessing platform support for structured and reproducible assessment workflows that extend across research contexts.

Our comparison applied different approaches for the 2 sets of criteria. For the comparison of FAIR principles, we reviewed each platform’s associated publications (specifically publications about the platform itself) and its documentation. This approach was chosen because the FAIR principles relate to design guidelines, which should be reflected in the platform’s foundational design principles.

For the functionality comparison, we relied on platform documentation and created test accounts where applicable to assess the user experience firsthand. In cases where creating a test account was unsuccessful, we referred to . This method was chosen because the functionality set is more user oriented, and evaluating it through direct interaction or online documentation provides a practical perspective on the platform’s usability and features.

It is important to note that some platforms (eg, Longitudinal Online Research and Imaging System [LORIS]) provide functions beyond survey design and data collection; however, our comparison only considers the survey-related features. Consequently, our comparison results depend heavily on the platform’s documentation and features available at the time of comparison, which may not always reflect the latest state of the platform.

Textbox 2. The functionality comparison criteria.Shared assessments: whether the platform provides at least 1 of the widely used mental health assessments (eg, Patient Health Questionnaire-9 and Generalized Anxiety Disorder-7) for immediate useMultilingual: whether the platform supports multiple languages for assessmentsMultimedia: whether users can directly upload multimedia clips, such as video or audio, into assessmentsValidation: whether the platform offers customizable mechanisms to ensure that entered data meets specific criteria, such as age within a predetermined rangeLogic: whether the platform supports skip or branch logic, allowing the response to a question to determine the subsequent questionScore: whether users can implement derivative calculations based on questionnaire responsesAdaptability: whether the platform is optimized for use across various devices, including mobile phones, tablets, and desktopsNoncode: whether the platform is accessible to researchers without programming skillsReproSchema in Research: Overview of Use Cases

This subsection demonstrates ReproSchema’s practical utility in research through 3 use cases of survey design: the NIMH-Minimal initiative [], the ABCD [] and HBCD Studies [], and the eCOBIDAS []. These examples highlight ReproSchema’s adaptability and ability to provide tailored solutions for various research needs.

The NIMH-Minimal case illustrates the comprehensive use of ReproSchema in developing standardized data collection protocols. The ABCD and HBCD cases showcase ReproSchema’s utility in standardizing and tracking changes to data elements for longitudinal studies. The eCOBIDAS case demonstrates ReproSchema’s flexibility in various survey types and research contexts.

While all 3 originate from mental health or neuroimaging contexts, their core challenges are broadly applicable. These include tracking historical changes across study waves; managing multisite collaborations; enforcing standardized protocols; and implementing interactive, logic-based assessments. ReproSchema addresses these challenges at the structural level, making the solutions transferable across domains. The Results section further details each use case and the specific problems ReproSchema was designed to solve.

Ethical Considerations

This study did not involve the collection of identifiable personal data, interaction with human participants, or access to confidential records. As such, ethical review and approval were not required, in accordance with the guidelines of the Massachusetts Institute of Technology’s Committee on the Use of Humans as Experimental Subjects.

ResultsOverview of ReproSchema Components

The ReproSchema project integrates 5 key components designed to standardize research protocols and enhance consistency across various stages of data collection. These components are briefly described in , with detailed features and descriptions.

These components, designed to function independently and together, form a comprehensive toolkit catering to various research needs—from creating individual assessments to designing entire study protocols. The ReproSchema project emphasizes flexibility, collaboration, and accessibility, aiming to facilitate a cohesive research process that is both standardized and adaptable to specific research contexts.

Our provides more detailed and up-to-date technical information. The document includes a complete walkthrough to help users set up a new data collection form and describes how to reuse and contribute to the existing library of assessments.

Table 1. Each component’s description and key features.ComponentDescriptionKey featuresFoundational schema (reproschema)Provides the structured framework for defining and linking data elements, ensuring consistency, interoperability, and standardization across studiesProvides a standardized framework for structuring questionnaires across diverse research domains, independent of specific topics
Standardizes organization, from specific questionnaire items to the entire research protocol
Assessment library (reproschema-library)Provides a collection of standardized questionnaires for various research needsCovers a wide range of assessments in clinical, cognitive, and behavioral sciences
Features version tracking and updating for each questionnaire
Open for researchers to add new assessments
Python-based CLIa tool (reproschema-py)Aids in creating, validating, and converting between formatsStandardizes schema creation and validation
Provides conversion tools between other formats (eg, REDCapb and FHIRc) and ReproSchema formats
UId (reproschema-ui)Enhances the setup and management of studies, improving data collectionOffers a user-friendly platform for visualizing and adjusting research protocols
Flexible use, either stand-alone or integrated with the assessment library
Back end (reproschema-backend)Manages survey data submission and storage, enabling structured, reproducible data collection with token-based authorizationSupports token-based authentication for secure data submission
Provides APIe end points for integrating with front-end interfaces and external systems
Protocol template (reproschema-protocol-cookiecutter)Offers a user-friendly template for researchers to develop standardized research protocolsProvides a step-by-step guide for protocol development
Includes detailed documentation and examples, facilitating implementation
Docker container (reproschema-server)Integrates the UI and back end, streamlining deployment for researchersBundles front end and back end into a single deployable unit
Simplifies setup and hosting using Docker

aCLI: command line interface.

bREDCap: Research Electronic Data Capture.

cFHIR: Fast Healthcare Interoperability Resources.

dUI: user interface.

eAPI: application programming interface.

Comparison Results

This analysis compares ReproSchema with 12 platforms across 3 distinct research domains: clinical research, general surveys and questionnaires, and online behavioral experiments. The platforms include CEDAR [], formr [], KoboToolbox [], LORIS [], MindLogger [], OpenClinica [], Pavlovia [], PsyToolkit [,], Qualtrics [], REDCap [], SurveyCTO [], and SurveyMonkey [].

provides an overview of each platform and its version at the time of comparison.

Table 2. Platform overview and version or and release date.PlatformDescriptionVersion or release dateCEDARaA metadata management tool focused on enhancing the annotation and retrieval of biomedical research data, adhering to FAIRb principlesv1.6.0formrA survey framework focusing on longitudinal online studies, integrating survey tools with data analysis capabilitiesv0.21.0KoboToolboxA globally used platform for data collection, management, and visualization in research and social impact initiativesv2.024.12bLORIScA web-based data management system specializing in neuroimaging and behavioral research that can handle longitudinal multisite study datav25.0.1MindLogger (rebranded as Curious [])A mobile and web platform that collects, manages, and analyzes mental health and behavioral datav1.25.0OpenClinicaAn electronic data capture platform optimized for clinical trial data managementCommunity edition (OpenClinica 4)Pavlovia []A platform for hosting, sharing, and running online experiments and surveysv2024.1.4 (PsychoPy version)PsyToolkitA website for designing, running, and analyzing online experiments and surveys in cognitive and personality psychologyv3.4.6QualtricsA survey tool offering functionalities for academic research, market studies, and customer feedback, with analytics capabilitiesMay 2024REDCapdA web application for building and managing online surveys and databases used in clinical and research data collectionv13.8.2ReproSchemaA system designed for standardizing and sharing research protocols, ensuring reproducibility and consistency across research settingsv1.0.0SurveyCTOA data collection platform specialized in mobile data and working in offline settingsv2.81.3SurveyMonkeyAn online survey development software tool that provides survey creation and analysis tools for diverse research needsMay 2024

aCEDAR: Center for Expanded Data Annotation and Retrieval.

bFAIR: findability, accessibility, interoperability, and reusability.

cLORIS: Longitudinal Online Research and Imaging System.

dREDCap: Research Electronic Data Capture.

presents a comparative analysis of each platform’s performance, evaluated against the adapted FAIR principles. The assessment criteria consisted of 4 categories: findability (F1-F4), accessibility (A1-A4), interoperability (I1-I3), and reusability (R1-R3). The results reveal varying degrees of compliance across the platforms, with some demonstrating stronger adherence to the criteria than others. presents each platform’s functional capabilities, highlighting their abilities to support key functions related to survey design and data management.

Table 3. Comparison based on the adapted findability, accessibility, interoperability, and reusability (FAIR) principles. The assessment criteria consisted of 4 categories: findability (F1-F4), accessibility (A1-A4), interoperability (I1-I3), and reusability (R1-R3).PlatformSchemaF1F2F3F4A1A2A3A4I1I2I3R1R2R3CEDARa✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓formr

✓
✓✓✓✓
✓✓

✓KoboToolbox

✓
✓✓✓✓
✓✓

✓LORISb

✓
✓✓✓✓✓✓✓

✓MindLogger✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓OpenClinica

✓
✓✓✓✓
✓✓

✓Pavlovia

✓✓✓✓
✓✓

✓PsyToolkit

✓✓✓✓
✓✓

Qualtrics

✓✓✓✓
✓✓

✓REDCapc

✓✓✓✓✓✓
✓✓✓

✓ReproSchema✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓SurveyCTO

✓
✓✓✓✓
✓✓

SurveyMonkey

✓✓✓
✓✓

aCEDAR: Center for Expanded Data Annotation and Retrieval.

bLORIS: Longitudinal Online Research and Imaging System.

cREDCap: Research Electronic Data Capture.

Table 4. Comparison based on functionality.PlatformShared assessmentMulti lingualMultimediaValidationLogicScoreAdaptabilityNoncodeCEDARa✓
✓

formr
✓✓✓✓✓✓✓KoboToolbox

✓✓✓✓✓✓LORISb

✓✓✓✓✓✓MindLogger✓✓✓✓✓✓✓✓OpenClinica
✓✓✓✓✓✓✓Pavlovia
✓✓✓✓

PsyToolkit✓
✓✓✓✓
✓Qualtrics
✓✓✓✓✓✓✓REDCapc✓✓✓✓✓✓✓

View original article

JOURNAL OF MEDICAL INTERNET RESEARCH

Share Bookmark

0 0 0 0 0 0 0

More from this channel

Standardizing Survey Data Collection to Enhance Reproducibility: Development and Comparative Evaluation of the ReproSchema Ecosystem

Comments (0)