Technology WG Notes Jan. 31-Feb. 1, 2005

The Technology working group considered biological and contextual data elements that are important to studies of the water column vrs the benthos. Technical challenges guided these discussions and "*" indicate essential contextual data elements:
  1. Data Water column
    1. Contextual information
      1. Location* -
      2. Latitude, longitude*
      3. Depth*
      4. Sampling methodology - see section on Sampling*
      5. Important physico-chemical parameters
        1. Conductivity*
        2. Temperature*
        3. Salinity*
        4. Density*
        5. pH*
        6. Time*
        7. Light intensity/depth*
      6. Desirable parameters
        1. fluorescence
          1. In situ fluorescence - Chlorophyll measurement.
          2. Variable fluorometry - fluorescence kinetics-quantum yield measurements. Maps in situ the activity of primary productions. (Falkowski)
        2. Oxygen
        3. Alkalinity
        4. DOC
        5. POC
        6. sulphides
        7. suphates
        8. nitrates
        9. phosphates
        10. turbidity to estimate particulate matter.
        11. Current - Relates to time series measurement. If sample drifts with current, will collect from same populations. If fixed, the current will sweep in different populations.
        12. Atmospheric data: Temp, humidity, wind strength and direction, solar radiation
    2. Biological related data. -
      1. DNA and RNA sequences - see below
      2. proteomics - see below
      3. metabolomics - see below
      4. lipidomics -see below
      5. staining data - Acridine orange staining, Propidium iodine.
        1. Microscopy
        2. FACS
      6. Particular size distributions - in situ camera for marine snow - not common
      7. Need quantitative measurements - number of species and their relative abundance - One concern is how to correct for multiple operons. The operon problem is big and not solved easily by sequencing. Is there a technical solution? Most organisms may have one or two operons. The worst case scenario is that we will off by a factor of 2 or 10. We recognize there is noise in all molecular techniques but it may be no worse than noise afforded by differential lysis, PCR bias but this is better than what we have learned by cultivation.
        1. qPCR
        2. SARST (but sensitive to operon number in organism)
        3. specific probes
        4. TRFLP,
        5. FISH,which is cumbersome, but the only way to get absolute relative numbers.Difficult to carry out in a high throughput technology. Currently can only multiplex three dyes.
        6. DNA microarrays to monitor hypervariable regions using labeled PCR amplicons or direct RNA-labeling plus control probes outside of rRNA hypervariable region to measure accessibility of probe on array.
        7. Explore the potential of using entire operon including spacer region to handle the operon problem.
  2. Soft sediment data (Part of Benthic systems)
    1. contextual information
      1. Location* -
      2. Latitude, longitude*
      3. Depth*
      4. Sampling methodology *
      5. Important parameters - need in situ data when possible.
        1. Chemical data
          1. Oxygen*
          2. pH*
          3. H2S*
          4. Porewater chemistry*-gel techniques available.
          5. CO2*
          6. Ca*
          7. Nitrate, Nitrite, Ammonia*
          8. Carbonate*
          9. Hydrogen*
          10. H2O2*
          11. Sulphides*
          12. SO3*
          13. Alkalinity *
          14. two dimensional O2 and pH imaging*
        2. Physical data - will vary depending upon environment e.g. sediments vs. sea floor vs. vents etc.
          1. Time*
          2. Porosity - sediments
          3. Grain size distribution- sediments
          4. Permeability - all
          5. Flows and currents - vents,
          6. Incident light - is it always dark?
          7. Temperature*
          8. Boundary layer characteristics
          9. 2 dimensional Imaging data*
          10. Structural characterization of data - gradients and physical structures.
      6. Desireable parameters - most by probe technology
        1. fluorescence
          1. In situ fluorescence - Chlorophyll measurement.
          2. Variable fluorometry - fluorescence kinetics
        2. DOC
        3. POC
        4. phosphates
        5. turbidity to estimate particulate matter.
        6. in situ time sampling
    2. Biological related data.
      1. sequences - see below
      2. proteomics - see below
      3. metabolomics - see below
      4. lipidomics
      5. staining data - Acridine orange staining, Propidium iodine.
        1. Microscopy
          1. Sectioning of frozen sediment samples
          2. Cryo SEM
  3. Technical Biological Capabilities
    1. Nucleic acids - All sequencing studies should have a pilot phase in which sample sequencing will guide high throughput sequencing.
      1. rRNA sequencing- phylotypes - current gold standard but it is not perfect technology.
        1. Concerns about efficient lysis and extraction from all cells.
          1. Cell lysis control - monitor by microscopy
        2. Concerns about differential amplification
        3. Close to full length sequences are desirable -but there is a trade off between throughput and information content.
        4. Development of SARST technology for generating reads appropriate for counting different phylotypes and for detecting novelty.
        5. Normalization may be needed to eliminate major players in what appears to be monolithic communities - alternative is to sequence more deeply
        6. Reconditioning PCR -2nd and 3rd round PCR amplifications to increase sensitivity and minimize cell number requirements.
        7. tRNA carrier to swamp nucleases and RNAase inhibitors to minimize degradation
        8. Need multiple primer sets - ICoMM can provide published list of primers - .
        9. Lowest samples sizes are approximately 2000 cells
          1. Can increase scale of sampling in water column samples to get larger number of cells.
      2. Single gene vs. MLST-
        1. MLST can only correlate appearance of different gene families i.e. major phylotypes with DSR clades or MCR clades etc. vs.genomic context -
        2. genomic context gets around the problem of interpreting MLST but can't control which MLST are in same large DNA insert library. MLST in environmental samples is a very different game than MLST in cultured isolates. For both MLST and Metagenomics in community sequencing approaches the classification of the sequenced fragments into "organism bins" is still an unsolved problem in case they do not overlap. The overlay of two trees based on different phylogenetic markers is problematic. Intrinsic DNA signatures mainly based on oligonucleotide frequencies as well as G+C content and codon usage have shown that they can guide the binning process.
      3. Environmental or metagenomic
        1. What is overlap between rRNAs from PCR and rRNA from metagenomics. - WARNING - assembler may be constructing chimeric sequences with metagenomic. datasets.
        2. Need to make rRNA PCR libraries from same DNA samples as we use for construction of small insert and larger insert metagenomics libraries.
        3. Need short insert and intermediate insert libraries and archive sufficient DNA and library material for future analyses. This implies some sort of archiving activity for key libraries and samples.
      4. Large Insert libraries:
        1. Large inserts with rDNA can be toxic. - causes bias in libraries - problem is likely presence of efficient rRNA promoters from cloned gene.
        2. Fosmids - prescreening using FISH to identify clones with rRNAs.
      5. Benthic environments have high variability:
        1. Time series need to be array based - too dynamic.
      6. COGs based upon 25 million base pairs may be sufficient to differentiate between sites.
      7. Nested Combined approach -
        1. Developed concept of a small number of flagship sites for heavy sampling of both the watercolumn and the underlying sediment and detailed workup.
        2. Start with initial pilot project screen study to decide where to do the big study. This would avoid aberrant results in an unfortunate or contaminated site.
        3. Use rRNA to look at microbial diversity. rRNA survey - deep levels multiple examples of a biotype. -Deep would be in terms of extinction curve up to 10,000 reads. Such sequencing methods can only provide estimates of the relative numbers of different kinds of organisms in the studied population
          1. SARST or analogous technique to develop a comprehensive understanding of the microbial population structure.
          2. May wish to compare this to diversity of cultivars that can be isolated using high throughput technologies.
          3. Use FISH or qPCR with universal probe and a probe for a mid-to high abundance phylotype from the sequencing survey to determine the absolute number of microbes in a sample and to obtain estimates of numbers of microbes of a particular phylotype. This data can then beused to compute normalized number of different kinds of microbes in the studied population.
        4. Number of examples of biotypes appropriate to study might be driven by informed advice by the community or by experience. This uncertainty is part of exploration. BATS and HOTS - oceanographic moderately deep sites, long time series. Infrastructure exists. Jan identified four sites.
          i.e. two additional coastal ocean sites. May need time samples as part of the time series.
        5. Metagenomics at high coverage for a few flagship sites -
      8. Single cell genomics. 100 cell genomics is current reality
        1. micro capsules on flow cytometry. Public domain technology.
        2. extrachromosomal DNA elements must be absent - kiss of death to rolling circle methods.
        3. use epicenter kits. Obtain 70kb fragments.
        4. Need at least 100 cells.
        5. Won't have cultivar to work with.
        6. how do we decide which single cell genome to work with.
    2. DNA microarrays
      1. Even random arrays can be valuable for differentiating between DNA populations. - DNA microarrays may be the future of pattern recognition or fingerprinting of community composition.
      2. Also useful for rDNA analyses - can be developed for determining relative numbers. We recognize the operon problem.
      3. Gene expression arrays although capable of providing functional descriptions are currently too complex and unreliable for functional surveys.
    3. Fractionation of population- This is a strategy to obtain subpopulations that are less complex than the total population. Can be done at nucleic acid stage or even according to the organisms life style - i.e. attached vs. non-attached cells etc.
      1. Fractionation according to GC using bisbenzymide.
      2. Differential filtration can also fractionate population but do not throw anything away. -Particle fractionation can resolve different kinds of communities but it does eliminate the ability to generate data about relative numbers of all different kinds of organisms in a sample.
      3. FACS (fluorescence activated cell sorting)
      4. Optical tweezer
  4. Cultivation
    1. Should we increase cultured representatives for physiological studies?
    2. One method is cultivation by dilution into low nutrient medium.
    3. Other technique is using beads and forming micro-colonies after running sea water through bead column.
      1. Takes 3-4 weeks to get colony of 100 cells. Look for 100 cell beads vs. 1 cell on a micro-droplet bead. 60% of time the 100 cell guys grow to 108.
      2. Should use environmental conditions to increase efficiencies of successful cultivation. For example, reproduce gradients.
      3. Perhaps use of longer initial incubations to get microgel wells to grow. Instead of 40% success of microgels growing after 3-4 weeks, it may go higher.
      4. Can use FTIR to sort microbial diversity in 10,000 different cultures.
      5. When used to study Sargasso Sea, obtained most of the clades in the Venter data set but the exact overlap was minimal. Unclear whether the Culturing effort or the sequencing effort had failed to approach a plateau in terms of discovery curves. It should also be noted that these studies were on totally different samples.
      6. This suggests an experiment that could inform us about percent of populations that reported by molecular studies,can be cultured.
        1. Carry out high throughput culturing using the micro-colony technique but incubate for several weeks under different conditions e.g. within and without light or under different oxygen conditions before sorting by FACS. It might be good to culture under steep gradient conditions- might be technically challenging.
        2. Carry out molecular surveys to very deep levels i.e. 10,000 clones from the original water sample and 10,000 sorted colonies.
        3. Compare rRNA populations looking for extent of overlap between rRNA amplicons from the cultivars versus from the environmental isolates.
    4. Use of cultivation in genomics
      1. Discussed Moore sequencing by Venter. Anticipate 5X coverage of 120 genomes -80 in progress. Phylogenetically disperse. Should we push for full coverage? -
      2. At some level in the phylogenetic tree we would like closed genomes. Closely related genomes benefit quickly from closed genomes. Divergent genomes don't benefit to the same extent from closure.
    5. Use of cultivation in proteomics
      1. Can use on cultivars
      2. Not useful on environmental genomes because of complexity
      3. On pure cultures, proteomics is powerful.
      4. Environmental samples: proteomics may not be able to reproducibly identify functional genes at this time.
  5. Metabolomics and Functional genomics
    1. One strategy is to select clones for analyzing genes in a genomic context is to screen large insert libraries according to function rather than homology.
    2. Nitrilase tree presented from clones isolated according to functional characteristic. There may be many functions of interest to marine biologists for which there already exist functional assays for recombinant clones containing fosmid or BAC sequences that can be screened in medium-high throughput manners. I.e. selection, agar plate color metric assay, MT plates, FACS, etc.- ICoMM should provide Diversa with a list.
    3. Bio markers in general have potential to be informative
    4. Intermediate metabolites
    5. Single genome metabolism vs. environmental metabolism
      1. Discussed idea that the active metabolism and hence functional diversity of an organism may reflect the microbial population context. In other words activity may reflect communication with other organisms in the population
  6. Database for ICoMM - General comment: This is a special area where ICoMM's activities can make an important contribution. We need a working group to handle the technical details but it should also include biologists who drive the database construction with scientific questions as well as a biostatistician and data base experts. Some overlap with the composition of the technology working group would be highly desireable. An important issue will be making sure that standards allow database interoperability.
    1. Databases that we want to link
      1. MICROBIS
      2. ARB
      3. Micro-Mar
      4. IMCG (Integrated microbial community genomes)
      5. GMOD - type databases for large insert libraries
      6. Database of GMOD like databases - see http:/www.tigr.org/tdb/MBMO or http://www.mbl.edu/giardia
      7. Proteomics and metabolomics.
      8. Recommend meeting for database experts.
      9. Lipidomics. Specific lipids for groups, subgroups or individual prokaryotes give environmental and evolutionary information
    2. Database capabilities today. Must establish standards for sharing data between participating databases. Must push for high-level initiative to organize marine microbial databases that contain comprehensive and curated datasets that allow complex queries and internal and external crossreferencing. ... Will need funding. - ARB is slow because of lack of resources. Funding for ARB has been all but impossible. Is this a job for the UN???? The following are issues that need to be addressed in database construction
      1. Overlay contextual data
      2. Identification of sequences (broadly defined) in GIS context.
      3. Data input - high thru put
      4. Phylogenetic excellence -e.g. ARB and/or CIPRES
      5. Super tree capability
      6. Distribution of enzymatic and bio-activities i.e. function in marine environments.
      7. With ties to biogeochemical processes.
      8. With ties to genomics.
      9. Kinds of additional database that may be constructed.
        1. Functional database
        2. Proteomic database
        3. Lipidomic database
        4. Metabolomics
    3. Database capabilities of the future. - Will require standards
      1. Highly integrated, cross referencing databases of metagenomics, metabolomics, proteomics, lipidomics
      2. Interfaces with high throughput phylogenomics -possibly CIPRES
      3. Big issue of capturing old data - Base line information.
        1. older molecular studies
        2. need information about GIS, location, depth etc.
        3. Biogeochemical information from study sites - may drive new molecular or census-based studies.
        4. Links to Prokaryotes and Bergeys databases?
    4. Money for maintenance is a big issue to resolve
      1. Emphasize concept of working databases and necessary link to curation vs. archival databases
  7. Other issues:
    1. Archival samples - who gets the DNA and who stores it?
    2. Metagenomic clones from John Heidelberg
      1. Small insert clones not saved
      2. Medium to larger inserts were saved
      3. Filters stored in sucrose lysis buffer - as in Giovanonni
      4. Sediment samples require 100's of grams for biogeochemical science.
      5. Cores kept at -70C for nucleic acids
      6. Need to have anaerobic reducing environments. Immediate treatment. Anaerobic hoods ship board. Counting device on ship board is important to know where you are but could use NASA technologies i.e. LAL or ATP assays.
      7. Phage mentioned but don't know best practice. Disagreement about best practice - Wolmark freezing in glycerol, Foeherster DNAse followed by 4C, Suttle other in between.
    3. Sampling strategy - replication? Density? Best sites? Depth of sampling in terms of numbers of reads. HOTS or BATS do deep sampling.
    4. Developed concept of flagship sites for heavy sampling and detailed workup. Start with initial pilot project screen study to decide where to do the big study. This would avoid aberrant results in an unfortunate or contaminated site.
      1. Use rRNA to look at microbial diversity.
      2. FISH, 5000-1000 clones, BATS and HOTS - oceanographic, moderately deep sites, long time series. Infrastructure exists. There are other coastal ocean sites that might be considered. May need time samples as part of the time series.
  8. Sampling scales
  9. Sample sharing - who makes decisions about precious samples?
  10. Attached sampling note from Chris Scholin who could not attend the meeting.

    Greetings -

    Again, my apologies for not being able to participate in this important workshop. I'd hoped to write something up on water samplers that might be of use as the group considers how it might acquire samples as part of ICoMM's mission. In that regard, we have precious little time at most locations (as in having a human presence) to sample microbial populations and relate changes in their abundance/gene expression to short and long-term alterations of the chem/phys environment. I think this reality, coupled with the value of having physical samples from different environments (even if they're just sitting in the freezer), speaks to need to automate sample collection and archival.

    There are a # of commercially available instruments out there that may help in this regard. I've collected and attach here some information about samplers that can be combined with other in-water observatory systems to enable collection of physical samples in the context of many other measurements. Indeed, depending on the observatory in question, it is possible for several sampling modes: pre-programmed (e.g., time-dependent); human in the loop adaptive sampling (i.e., event-dependent - e.g., with respect to out put of an optical sensor); autonomous adaptive sampling (as in previous). Some of the samplers provide minimal chemical processing of material collected (e.g., fixation for nucleic acid recovery or whole cell analyses). In any case, it's worth considering use of such devices in conjunction with on-going efforts to establish local, regional and global observatory backbone (incl. fresh waters as well). Enormous investments are being made along those lines, and it
    seems ICoMM would be well advised to take advantage of that infrastructure as you gain not only infrastructure (operations/platforms) but contextual data products as well.
    Beyond 'simple' samplers, new systems are emerging that are making it possible to conduct molecular analytical analyses remotely, in situ, even subsurface. Two examples are the
    "autonomous microbial genosensor"
    (AMG; John Paul et al., U of S Florida, Center for Ocean Technology)
    and the
    "environmental sample processor"
    (ESP, Scholin et al.), MBARI.
    The AMG is designed around NASBA isothermal amplification analyses and has no capacity for sample archival. The ESP emphasizes the interface with the environment and sample processing (archival, extraction, fractionation), and supports a variety of molecular probe analyses (e.g., probe arrays, antibody assays, and many other analyses possible too). There are other in-water sensors that may make it possible to conduct whole cell
    (e.g., FISH) analyses autonomously, such as the in situ flow cytometer that WHOI researchers are working on (there are other in-water flow cytometer systems out there, but none I'm aware of also allow for probe application).

    I hope this very brief bit of information will be of use. I look forward to participating more in the future --

    Chris Scholin
    Monterey Bay Aquarium Research Institute (MBARI)
    7700 Sandholdt Rd.
    Moss Landing, CA 95039-0628
    phone: 831-775-1779
    fax: 831-775-1620
    web: www.mbari.org