Getting Started

Since this app is exclusively being used for EXPECT’s AEP copper data extraction right now, I’m writing these instructions for myself, Pierre, Camden, Li, Knut Erik, and anyone else I’ve roped in to help with data extraction.

Some tips:

Paper list here. Columns should be self-explanatory.
Save extractions (ongoing and completed) on Google Drive so others can take over if needed.
Use Download All frequently. The app is stable, but this prevents lost work.
Check for Supplementary Information/Supporting Information (ESI) first. If it contains raw data tables, merge it with the main PDF before uploading.
Use two screens - one for the app, one for the paper. Some UI elements need wider screens.
You can work in Excel if easier, but use the controlled vocabulary.
Since the migration to NIVA’s cloud server, entered data has started being retained between sessions. This can be useful, but hit Ctrl + Shift + R before starting a new extraction to clear existing data.

If something breaks, email saw@niva.no or log an issue here with screenshots and your extraction file.

Instructions and tooltips ( , mouse over for more information) are available on each page. For an FAQ and more details instructions, go to the Info tab ( ).

Quick Start Buttons

Version 0.0.0.9006

Convenience

Download extracted/entered data as a .zip of CSVs, then reupload via the landing page to continue later.
The app should now crash less often, and return at least vaguely informative error notifications when something does go wrong.
Added a manual field for setting extraction token limit (i.e. maximum document size). Not tested a great deal, but for one paper (15 pages) I had to bump the limit up to 10,000 to get data out.

Sites

When adding sites from the map you can now choose what precision to report coordinates at.
Replaced COUNTRY with COUNTRY_ISO and AREA with OCEAN_IHO

Compartments

Added Aquatic Sediment, Sludge, and Porewater as subcompartments of Aquatic

Samples

Sample REPLICATE has been changed to SUBSAMPLE and now takes text values. This allows splitting samples into e.g. different sediment core depths, different combinations of tissues and organisms, etc.

Measurements

w/w dry weight and wet weight units (both conventional SI and ppm/ppb/etc.) can now be selected as options.
MEASURED_SD has been replaced with UNCERTAINTY_TYPE, UNCERTAINTY_UPPER, UNCERTAINTY_LOWER and MEASURED_N, which allow more precise reporting of uncertainty.
EXPERIMENTAL: As part of the extraction process the LLM now scores the paper and its job extracting data from it. I don’t know how effective this is, but I’d be interested to hear all your experiences with it.

CREED

CREED assessment is now possible in the Quality module. Currently it uses a set of Relevance criteria specific to Copper concentrations in Arctic and Near-Arctic compartments (EXPECT Project), but in the future this will be tweakable.
The final assessment results are displayed as a table and can be downloaded using the regular button.

Instructions

This module uses the Claude Sonnet 4 Large Language Model (last updated 14th May 2025) to automatically extract environmental exposure data from uploaded PDFs. Upload a research paper or report, provide your Anthropic API key (note that this requires an Anthropic developer account, which is not the same as a subscription to Claude), and the system will attempt to populate all form fields automatically.
LLMs can and do make mistakes, including hallucinating information that is not present in the source document, so please carefully review and correct the extracted data in subsequent modules before validation.
If you don’t have an API key to hand, click “Load Dummy Data” for example data.
LLM extractions are stochastic, not deterministic. It is not guaranteed you will get the same output each time you extract data from the same source.
Note that LLM data extraction will overwrite existing manually entered data.

Entered By

Upload PDF

Browse...

Anthropic API Key

Max Tokens

Modify Prompt and Data Structure (Advanced)

Extraction Instructions

# Environmental Exposure Study Data Extraction Prompt - COPPER ONLY

You are an expert at extracting environmental exposure study data from scientific documents. Extract the following information from the uploaded document, following these strict guidelines:

## CRITICAL RULES

- Only extract information that is explicitly stated in the document
- Only extract data about concentrations of copper and copper-containing compounds.
- Likewise, only report sites and compartments if they have been directly analysed for copper content. In a study includes e.g. stressor concentrations and biodiversity, only sites where stressor concentration is assessed should be reported.
- Do NOT guess, infer, make assumptions, or use outside knowledge to fill gaps
- Do NOT use your knowledge of places, chemicals, or studies to add information not in the document
- Ensure that information on ALL relevant biota is extracted. This may
  be hidden in tables or appendices
- Use 'null' for any field where information is not clearly provided in the text
- For coordinates: ONLY extract if latitude/longitude are explicitly stated as numbers in the document
- For dates, use YYYY-MM-DD format only
- Where uncertain over interpretation, or where the existing schema does not capture enough nuance for interpretation report additional details (prerrably verbatim quotes) in *_comment fields.

## CONTROLLED VOCABULARY

Use these exact terms when applicable:

### Environmental Compartments

See schema

### Measurement Categories

See schema

### Geographic Features

See schema

### Species Groups (for Biota)

- Worms
- Insects/Spiders
- Molluscs
- Fungi
- Crustaceans
- Mammals
- Amphibians
- Moss Hornworts
- Birds
- Fish
- Flowers Trees Shrubs Ferns
- Algae
- Invertebrates
- Reptiles
- Bacteria
- Ecosystem
- Other

### Biota-Specific Information

#### Species Name

- Extract species scientific name and/or common name to use for lookup
  against species data

#### Tissue Types

- See schema

#### Life Stages

- See schema

#### Genders

- See schema

### Reliability Evaluation Systems

- See schema

### Reference Types

- See schema

### Methods/Protocols

- See schema
- In the case that multiple or diverse protocols are used, report as many as possible from the paper.

## EXTRACTION FOCUS AREAS

### 1. Study Metadata

- See schema

### 2. Bibliographic Information

- See schema

### 3. Sampling Sites

- See schema

### 4. Sample Information

- See schema

### 5. Measured Parameters

- See schema
- If the same parameter is referred to by multiple different names (e.g. Copper and Cu, Paracetamol and Acetaminophen) only include one name, preferably the fullest and most universal. However, if multiple parameters with the same chemical composition but otherwise relevant differences (e.g. particle size, structure, etc.), report these as different parameters. If uncertain, report to

Schema Definition

<ellmer::TypeObject>
 @ description          : chr "Extract environmental exposure study data from this document"
 @ required             : logi TRUE
 @ properties           :List of 9
 .. $ campaign    : <ellmer::TypeArray>
 ..  ..@ description: NULL
 ..  ..@ required   : logi TRUE
 ..  ..@ items      : <ellmer::TypeObject>
 .. .. .. @ description          : chr "Basic study/campaign information"
 .. .. .. @ required             : logi TRUE
 .. .. .. @ properties           :List of 7
 .. .. .. .. $ campaign_name      : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Identifier for the study/campaign (max 100 chars)"
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ campaign_name_short: <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Abbreviated form of the Identifier for the study/campaign (max 20 char, no spaces or underscores). \n      Shou"| __truncated__
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ campaign_start_date: <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Study start date in YYYY-MM-DD format"
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ campaign_end_date  : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Study end date in YYYY-MM-DD format"
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ organisation       : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Organisation that conducted the study (max 50 chars)"
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ campaign_comment   : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Additional study details or notes (max 1000 chars)"
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ campaign_rationale : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "The rationale given for the study or sampling campaign. Max 500 chars."
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. @ additional_properties: logi FALSE
 .. $ references  : <ellmer::TypeArray>
 ..  ..@ description: NULL
 ..  ..@ required   : logi TRUE
 ..  ..@ items      : <ellmer::TypeObject>
 .. .. .. @ description          : chr "Bibliographic information about this document. Return NA if not found."
 .. .. .. @ required             : logi TRUE
 .. .. .. @ properties           :List of 8
 .. .. .. .. $ author            : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Authors in format: Last1, First1; Last2, First2 (max 1000 chars)"
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ title             : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Document title (max 1000 chars). Return NA if not found."
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ year              : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Publication year (1800-2026). Return NA if not found."
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "integer"
 .. .. .. .. $ periodical_journal: <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Journal name for articles. Return NA if not found."
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ volume            : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Journal volume number. Return NA if not found."
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "integer"
 .. .. .. .. $ issue             : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Journal issue number. Return NA if not found."
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "integer"
 .. .. .. .. $ publisher         : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Publisher name. Return NA if not found."
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ doi               : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Digital Object Identifier. Return NA if not found."
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. @ additional_properties: logi FALSE
 .. $ sites       : <ellmer::TypeArray>
 ..  ..@ description: NULL
 ..  ..@ required   : logi TRUE
 ..  ..@ items      : <ellmer::TypeObject>
 .. .. .. @ description          : chr "Information about a sampling site"
 .. .. .. @ required             : logi TRUE
 .. .. .. @ properties           :List of 9
 .. .. .. .. $ site_code                  : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Short site identifier/code"
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ site_name                  : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Descriptive site name. If many sampling sites are reported without specific coordinates, note in the name and s"| __truncated__
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ latitude                   : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Latitude in decimal degrees (-90 to 90) - ONLY if explicitly stated in document"
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "number"
 .. .. .. .. $ longitude                  : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Longitude in decimal degrees (-180 to 180) - ONLY if explicitly stated in document"
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "number"
 .. .. .. .. $ country_iso                : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "ISO Country where (terrestrial) site is located. Use full name (ISO 3166 and not codes (3166-2)) \n        In c"| __truncated__
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ ocean_iho                  : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Ocean or sea where (marine) site is located. In case of site at the land-sea interface, return both country and"| __truncated__
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ site_geographic_feature    : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Geographic feature type from: Not relevant, Not reported, River, stream, canal, Lake, pond, pool, reservoir, Oc"| __truncated__
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ site_geographic_feature_sub: <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Geographic sub-feature type from: Not relevant, Not reported, Water surface, Water column, pelagic zone, Water "| __truncated__
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ site_comment               : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Any additional details about the site not captured in the previous variables. If coordinates are converted from"| __truncated__
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. @ additional_properties: logi FALSE
 .. $ parameters  : <ellmer::TypeArray>
 ..  ..@ description: NULL
 ..  ..@ required   : logi TRUE
 ..  ..@ items      : <ellmer::TypeObject>
 .. .. .. @ description          : chr "A measured parameter/stressor"
 .. .. .. @ required             : logi TRUE
 .. .. .. @ properties           :List of 4
 .. .. .. .. $ parameter_name   : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Name of the parameter/chemical/stressor measured. If a parameter is reported under multiple names\n         (e."| __truncated__
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ parameter_type   : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Type: Stressor, Quality parameter, Normalization, or Background"
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ cas_rn           : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "CAS Registry Number if chemical"
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ parameter_comment: <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Any other comments relevant to undderstanding/interpreting measured parameters."
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. @ additional_properties: logi FALSE
 .. $ compartments: <ellmer::TypeArray>
 ..  ..@ description: NULL
 ..  ..@ required   : logi TRUE
 ..  ..@ items      : <ellmer::TypeObject>
 .. .. .. @ description          : chr "An environmental compartment sampled"
 .. .. .. @ required             : logi TRUE
 .. .. .. @ properties           :List of 3
 .. .. .. .. $ environ_compartment    : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Main compartment: Aquatic, Atmospheric, Terrestrial, or Biota"
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ environ_compartment_sub: <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Sub-compartment: c(Freshwater = \"Freshwater\", `Marine/Salt Water` = \"Marine/Salt Water\", `Brackish/Transiti"| __truncated__
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ measured_category      : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Measurement category: External Media, Internal to Organism, Surface of Organism"
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. @ additional_properties: logi FALSE
 .. $ biota       : <ellmer::TypeArray>
 ..  ..@ description: NULL
 ..  ..@ required   : logi TRUE
 ..  ..@ items      : <ellmer::TypeObject>
 .. .. .. @ description          : chr "Biological sampling information"
 .. .. .. @ required             : logi TRUE
 .. .. .. @ properties           :List of 6
 .. .. .. .. $ sample_id               : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Sample identifier"
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ species_group           : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Taxonomic group: All, Algae, Amphibians, Bacteria, Birds, Crustaceans, Ecosystem, Fish, Fungi, Insects/Spiders,"| __truncated__
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ sample_species          : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Species name (scientific if reported otherwise common)"
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ sample_tissue           : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Tissue type: Not reported, Not relevant, Whole body, Total soft tissues, Muscle, Liver, Kidney, Fat/Adipose, Sk"| __truncated__
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ sample_species_lifestage: <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Life stage: Not reported, Not relevant, Adult, Juvenile, Larva, Embryo, Egg, Seedling, Mature, Young, Mixed, Other"
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ sample_species_gender   : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Gender: Not reported, Not relevant, Male, Female, Mixed, Hermaphrodite, Other"
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. @ additional_properties: logi FALSE
 .. $ methods     : <ellmer::TypeArray>
 ..  ..@ description: NULL
 ..  ..@ required   : logi TRUE
 ..  ..@ items      : <ellmer::TypeObject>
 .. .. .. @ description          : chr "Analytical and sampling methods used"
 .. .. .. @ required             : logi TRUE
 .. .. .. @ properties           :List of 3
 .. .. .. .. $ protocol_category: <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Protocol type: Sampling Protocol, Fractionation Protocol, Extraction Protocol, Analytical Protocol"
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ protocol_name    : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Protocol name is more like a rough grouping: c(\"Sampling Protocol\", \"Sampling Protocol\", \"Sampling Protoco"| __truncated__
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ protocol_comment : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Additional details about the method, including a more specific description of the method (applicance name, reag"| __truncated__
 .. .. .. ..  ..@ required   : logi FALSE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. @ additional_properties: logi FALSE
 .. $ samples     : <ellmer::TypeArray>
 ..  ..@ description: chr "Information on the overall sampling strategy of the paper"
 ..  ..@ required   : logi TRUE
 ..  ..@ items      : <ellmer::TypeObject>
 .. .. .. @ description          : chr "Information on the overall sampling strategy of the paper, a (potentially assymetrical) combination of sites, d"| __truncated__
 .. .. .. @ required             : logi TRUE
 .. .. .. @ properties           :List of 6
 .. .. .. .. $ sampling_dates      : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Dates (YYYY-MM-DD) when samples were taken"
 .. .. .. ..  ..@ required   : logi TRUE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ sampling_site_code  : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "The {site_code} where samples were taken."
 .. .. .. ..  ..@ required   : logi TRUE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ sampling_site_name  : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "The {site_name} where samples were taken."
 .. .. .. ..  ..@ required   : logi TRUE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ sampling_compartment: <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "The {environ_compartment} > {environ_compartment_sub} > {measured_category} that was sampled."
 .. .. .. ..  ..@ required   : logi TRUE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ sampling_parameters : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "The {parameter_name} measured or analysed, based on the data extracted earlier in the schema."
 .. .. .. ..  ..@ required   : logi TRUE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. .. $ subsample_indices   : <ellmer::TypeBasic>
 .. .. .. ..  ..@ description: chr "Any other sampling dimensions not captured in the schema. this may include tissues, species, core depth, replic"| __truncated__
 .. .. .. ..  ..@ required   : logi TRUE
 .. .. .. ..  ..@ type       : chr "string"
 .. .. .. @ additional_properties: logi FALSE
 .. $ comments    : <ellmer::TypeObject>
 ..  ..@ description          : chr "Commentary and metadata on the information quality of the paper, and of the LLM extraction. For scoring, 1 is w"| __truncated__
 ..  ..@ required             : logi TRUE
 ..  ..@ properties           :List of 5
 .. .. .. $ paper_relevance       : <ellmer::TypeBasic>
 .. .. ..  ..@ description: chr "A comment on the relevance of the paper to the questions posed in the prompt (2 sentences). Return in the forma"| __truncated__
 .. .. ..  ..@ required   : logi TRUE
 .. .. ..  ..@ type       : chr "string"
 .. .. .. $ paper_reliability     : <ellmer::TypeBasic>
 .. .. ..  ..@ description: chr "A general assessment of the reliability of the paper. Return in the format Score: {1-5}: {text}"
 .. .. ..  ..@ required   : logi TRUE
 .. .. ..  ..@ type       : chr "string"
 .. .. .. $ paper_data_source     : <ellmer::TypeBasic>
 .. .. ..  ..@ description: chr "Where the paper's original data came from (e.g. does the paper describe the generation of data, or its aquisiti"| __truncated__
 .. .. ..  ..@ required   : logi TRUE
 .. .. ..  ..@ type       : chr "string"
 .. .. .. $ paper_data_available  : <ellmer::TypeBasic>
 .. .. ..  ..@ description: chr "Whether the data analysed in the paper is available, in particular in a good format (CSV), ok format (table wit"| __truncated__
 .. .. ..  ..@ required   : logi TRUE
 .. .. ..  ..@ type       : chr "string"
 .. .. .. $ extraction_assessement: <ellmer::TypeBasic>
 .. .. ..  ..@ description: chr "An assessment of how confident you the LLM are in the quality of the data extraction process. \n      Has relev"| __truncated__
 .. .. ..  ..@ required   : logi TRUE
 .. .. ..  ..@ type       : chr "string"
 ..  ..@ additional_properties: logi FALSE
 @ additional_properties: logi FALSE

Extraction Results and Comments

This is an experiment in getting the LLM to report on its own opinion of the extraction. I don't yet know how good its assessment is, but I'm interested to hear your feedback.

Instructions

Enter basic campaign metadata for environmental sampling data. Required fields are marked in red.

Campaign Name

Campaign Name Short

Organisation

Campaign Start Date

Campaign End Date

Reliability Evaluation System

Reliability Score

Confidentiality Expiry Date

Entered By

Entered Date

Campaign Comments

Click to view raw validated data

Instructions

Enter bibliographic information on your dataset.
Required fields change with reference type.
Use the DOI lookup or BibTeX import features to auto-populate fields when available.

DOI Lookup

BibTeX Import

Reference Type

Data Source

Reference ID (Auto-generated)

Author(s)

Title

Year

Date Accessed

Entered by

Journal Name

Volume

Issue

Publisher

Institution

DOI

URL

ISBN/ISSN

Edition

Document Number

Reference Comment

Click to view raw validated data

Instructions

Enter sampling site information. Use COUNTRY_ISO for terrestrial sites and OCEAN_IHO for marine sites. If ambiguous, use both.
Add blank sites by clicking ‘Add Site(s)’ below, or selecting a point on the map (right) and clicking ‘Add’. Use an appropriate level of precision based on the source data - 3 d.p. (within 111 m) or 4 d.p. (within 11 m) may be most appropriate.
Edit fields directly in the table. Use the map to verify coordinates are correct.
Sites added via the table will appear on the map, and the inverse.
Delete uneeded rows with the right-click menu, and use the blue handle in the bottom right of a selected cell to copy its contents to another cell (just like Excel!)
Click the Expand icon in the bottom-right of the map and table panels to view them in full screen
When searching for locations by name, it can be easier to look them up in Google Maps. If you right-click a location there you can copy its coordinates to the clipboard. Do note that Google normally returns coordinates to 14 d.p., which is a completely unrealistic level of precision.

n sites

Site Code Root

Click to view raw validated data

Instructions

This module manages measured parameters (stressors, quality parameters, etc.). Select parameter type, subtype and name from dropdowns, then add to table. You can add existing parameters (with pre-filled chemical IDs) or create new ones.
Edit fields directly in the table. Stressor subtypes are derived from the ClassyFire taxonomy.
Search for parameters by name by selecting the ‘Parameter Name’ field, hitting Backspace to delete the current entry, then typing the name of your parameter.
LLM extraction tends to include any parameters in the paper, even if you specifically ask it not to. Please remove any irrelevant parameters from the table before continuing.

Parameter Type

Parameter Subtype

Parameter Name

Parameter Comments

LLM extracted data validation

Click to view raw validated data

Instructions

Enter environmental compartment combinations that will be sampled. Select an environmental compartment (e.g., Aquatic), then choose the appropriate sub-compartment (e.g., Freshwater) and measurement category (e.g. inside an organism, on the surface of an organism, or in the external environment).
Each combination represents a different sampling context used in the study or dataset. If there really isn’t a good option for your reported compartment, mark it as “Other” and let me know.
For biota, use the compartment they reproduce in. This is potentially a little ambiguous, but I interpret it as follows:
- Aquatic: Fish, cetaceans, marine amphibians, etc.
- Terrestrial: All land animals, birds (inc. penguins and seabirds), pinnipeds, etc.
- Atmospheric: Maybe some bacteria? You are unlikely to need this category.
- Other: Included just in case I have missed something obvious.
This is probably one of the most conceptually confusing parts of the workflow, so please let me know if you have any clever ideas to tweak it!

Environmental Compartment

Environmental Sub-Compartment

Measured Category

Click to view raw validated data

Instructions

Sampling, fractionation, extraction, and analytical protocols can affect ultimate measured concentrations by order of magnitude.
Please report the relevant protocol types/classes here. In the Measurements module, you will be able to match specific protocols to samples.
What’s the exact difference between extraction and fractionation? It’s not entirely clear to me, and different papers use different language. Methods may also report a series of techniques that don’t fit well into a single protocol option. I am open to suggestions and constructive criticism here.
LLM extraction is particularly bad at matching protocols to categories. Please exercise caution when validating results.

Protocol Category

Protocol Name

Click to view raw validated data

Instructions

Create sample combinations by selecting sites, parameters, compartments, and sampling dates.
Each combination represents a specific sample that will be collected. Select multiple values from each category to generate all possible combinations.
In cases where samples need to be further split, use the SUBSAMPLE field to split samples. For example 5cm, 10cm, 15cm ; trout liver, trout kidney, salmon liver, salmon kidney; etc.

Sites

Parameters

Compartments

Sampling Dates

Subsample Indices

Click to view raw validated data

Instructions

Enter detailed biological information for samples from living organisms.
All biota samples require species identification, tissue type, life stage, and gender information to be considered valid.
In cases where biodiversity indices are reported, use the Species Group “Ecosystem”.
Species not in the database can be added manually in the table, but please check if they are synonyms beforehand.
I’ve kept tissue types and life stage options intentionally short to avoid impractically large dropdowns. Mark as Other and add a comment if needed.
Additional comments judged relevant to biota can also be added.

Study Species Selection

First select a species group, then choose specific species for your study. Selected species will be available in the sample table below.

Filter by species group

Select species sampled in dataset

Species sampled in dataset

LLM extracted data validation

Click to view raw validated biota data

Instructions

This module consolidates all setup data and presents sample-parameter combinations for measurement data entry.
Entered data can be downloaded at any time, even if not fully complete or valid. Click “Save Table Data” to make it available for download.
Enter measured concentrations, detection limits, and associated metadata for each combination. Where levels below detection or quantification (LOD, LOQ) limits are reported, use the MEASURED_FLAG dropdowns and dedicated LOD/LOQ columns.
Numbers default to displaying 4 d.p., but full precision is retained (e.g. a lot more than 4 d.p.)
Columns can be sorted by clicking the column name (numeric, chronological, alphabetical order).
Due to issues with data entry, I’ve disabled some visual validation feedback temporarily.
If summarised data are reported (averages, SDs, ranges, etc.), report these using the new UNCERTAINTY_UPPER/_LOWER and MEASURED_N columns. Where MEASURED_N = 1 or otherwise too low for summary statistics, report UNCERTAINTY_TYPE as “Not Relevant”.

Check Module Validation

Measurement Data Entry

Complete measurement data for all sample-parameter combinations below. All fields marked as required must be filled.

Preview Complete Dataset

Some data populated from LLM extraction - please review for accuracy

Instructions

Explore measured values across different parameters, sites, compartments, and sampling dates (early prototype).
Use the controls below to customize the visualisations and identify any data patterns or potential issues before export.

Visualization Controls

X-Axis Variable

Color/Group Variable

Measurement Data Overview

Data Summary

Data Quality Summary

Data Completeness & Quality Indicators

CREED

This module provides a Criteria for Reporting and Evaluating Exposure Datasets (CREED) assessment (version 1), based on Di Paolo et al., 2024. The Dataset Details section summarises key characteristics of your dataset that are relevant for quality evaluation. Fields are auto-populated where possible from your imported data. This module is currently designed for exposure assessment of copper, and the data summarised may not be relevant for other pollutants (especially non-metals).

The CREED assessment process assumes that exposure data are being assessed in the context of a broader chemical/ecological assessment, and thus not all criteria will necessarily be relevant to you the user’s needs. In this case, you are encouraged to record this in the relevant fields.

CREED assigns a dataset Gold status if all Recommended criteria (11 reliability, 4 relevance) are met, or Silver status if all Required criteria (7 reliability, 7 relevance are met). Once you complete this module, you will be able to mark your dataset as Gold or Silver if it meets appropriate criteria.

Required criteria ( ) are needed for Silver level scoring. Recommended criteria ( ) are additional requirements for Gold level scoring.

Auto-population: Fields marked with this icon are auto-populated from data entered in earlier modules. This data can be overwritten as needed, but note that if you populate fields from data again, your changes will not be saved.

1. Purpose Statement

The CREED grading process begins with laying out the purpose of your overall chemical/ecological impact or risk assessment, which shapes the criteria used.
As it is usual to include multiple datasets in such an assessment, we recommend reusing a single Purpose Statement per assessment.

Purpose Statement

Describe the objective for which the usability of the dataset is assessed, including any required dataset thresholds. CREED Relevance Questions (RV01 - RV11) are based on the study purpose. Reliability Questions (RB01 - RB19) are common across all studies.

Purpose Statement (Pre-made copper statement/thresholds):

You can usually import/export purpose statements here, but since we're just doing a copper AEP for now I've hard-coded it.

Enter thresholds for 'Partly Met' (minimum requirements) and 'Fully Met' (optimal requirements). Leave fields blank if no specific threshold is needed.