Frequently Asked Questions

How do I get data into EDD?

The Experiment Data Depot (EDD) imports data in two steps. (Fig. 1)

  1. Experiment Description input: this file describes your experiment design so EDD knows how to store all your data, and how it is related to your strains and samples (see below for more information).

  2. Data input: different types of data can be added in several successive steps. These data input steps are independent of each other, facilitating the combination of different types of data (e.g. multiomics data sets).

You can find tutorials and protocols for study creation and data import.

Graphic showing spreadsheets with arrows pointing toward EDD logo
Fig. 1: Data input process. Data is imported to EDD in two phases. In the first one, you import an experiment description file, which describes the experiment to EDD so it knows how to store your data. Afterwards, you can add as many data types (e.g. transcriptomics, proteomics, …) as desired in each of the data imports.

What is an experiment description?

An experiment description file is an excel file that describes your experiment (Fig. 2): which strains you are using (part ID from ICE), how they are being cultured (lines and metadata), which samples are being taken (assays) and how they are processed (protocol). Look at Fig. 3 to see how EDD organizes your experimental data (i.e. the ontology).

The experiment description provides a single file standardized description of your experiment that is useful for, e.g., you to design your experiment, or the proteomics or metabolomics team to understand your experiment so they can plan how they will process your samples.

Input in Excel: Screenshot of Excel with an Experiment Description file Import result in EDD: Screenshot of EDD following addition of Experiment Description file
Fig. 2: Examples of experiment description. The upper picture represents the example experiment description file in Excel, with a line name that helps identify the culture, a line description that gives more information on the line, the part ID in the corresponding part repository (public ABF in this case), different types of metadata (shaking speed, … growth temperature), the number of replicates, and an optional field (in blue): assay information (i.e. a protocol applied to a line at a given time point) for targeted proteomics. The replicate count will create several lines for each replicate (3 for wild type and 4 for the other strain, see below). The assay information is optional: you may want to use this to tell the proteomics or metabolomics services when you are sampling so they can add the data, or you can add the data later yourself. The lower pictures shows how this information is represented in EDD. Notice that the Part IDs have become links to the corresponding registry.

What is a line?

A “Line” in EDD is a distinct set of experimental conditions, (e.g. a single culture). A Line generally corresponds to the contents of a shake flask or well plate, though it could also be, e.g., a tube containing an arabidopsis seed or an ionic liquid for a given pretreament. A line is not a sample: several samples can be obtained from a single line at different times (see Fig. 3).

A typical experiment (Fig. 3) would take strains from a repository, culture them in different flasks (lines), apply a protocol at a given time (an assay), and obtain different measurement data. Protocols are kept under protocols.io to enable reproducibility and better communication. You can find the LBNL repository here.

Illustration of the different levels of EDD ontology
Fig. 3: EDD data organization (ontology). In this example, we have three different strains (A,B, and C). Strain A is cultured in two different flasks, giving rise to two lines (A1 and A2). Strain B is cultured in a single flask, giving rise to a single line B1. Strain C is cultured in three flasks, giving rise to three lines: C1, C2 and C3. Line A2 is assayed through HP-LC (protocol) at times t = 10 hr (assay A2-HPLC-1) and t=8 hr (assay A2-HPLC-2). Assay A2-HPLC-1 produces two measurements: 3 mg/L of Acetate and 2 mg/L of Lactate. Assay A2-HPLC-2 produces two measurements: 2 mg/L of Acetate and 1.5 mg/L of Lactate.

How do I choose good line names?

A good way to name your lines involves the strain name, culture conditions and whichever other condition is being changed in the experiment. For example, WT-LB-70C would indicate is a wild type, grown on LB at 70º C (imagine you are trying different growth temperatures). Cineole-EZ-50C indicates a cineole producing strain, grown on EZ at 50º C … etc.

What are the column options for experiment description?

The primary line characteristics that you should have in every experiment description and every EDD service (instance) are:

  • Line Name: a short name that uniquely identifies the line (REQUIRED).
  • Line Description: A short human-readable description for the line (encouraged).
  • Part ID: the unique ICE part number identifiers for the strains involved (encouraged).
  • Replicate Count: the number of experimental replicates for this set of experimental conditions (encouraged).

Other metadata types (e.g. media, temperatures, culture volume, flask volume, shaking speed … etc) are also available, but depend on which EDD site you are using. Ask your EDD administrator for more information. Columns can be in any order.

TBD: include link to full metadata listing in any EDD.

Why should I use the Experiment Data Depot?

The Experiment Data Depot (EDD) is a standardized repository of experimental data. This is useful for the following reasons:

  • EDD provides a single point of storage for your experimental data, to be easily referenced. Instead of providing a collection of spreadsheets organized in an adhoc manner in the supplementary material of your paper, you can give a single URL where your readers can find all the data in a format that is always the same. This will make your papers more likely to be cited. In the same way that storing your strain information in the Inventory of Composable Elements (ICE) will make it easier to access and more likely to be cited.

  • Easily collate different types of multiomics data. Comparing the results of phenotyping a cell using transcriptomics, proteomics and metabolomics can be complicated. EDD facilitates this task with the use of a standard vocabulary for genes, proteins and metabolites, solving the problem of leveraging multiomics data.

  • EDD facilitates data analysis. By using a standard data format through EDD, you can leverage previously created Jupyter notebooks to easily do your calibrations and statistics (e.g. calcualte error bars).

  • Enable Advanced Learn techniques. EDD helps you interact with data scientists to use Machine Learning and Artificial Intelligence techniques to effectively guide metabolic engineering. Just give them the link of your study and you will save them the wrangling of spreadsheets that consumes 50-80% of their time.

You may not have the correct permissions to view the Study. Ask the person who sent you the link to give you read permissions.

What is a slug?

A slug is a way to identify a Study in links in a more easily readable form. Using a slug allows for links to look like the below, with slug pcap:

https://public-edd.jbei.org/s/pcap/overview/

Instead of using a link to the same study that looks like this:

https://public-edd.jbei.org/study/2843/overview/