Introduction

The identification of genomic variation helps to clarify the relationship between genotype and phenotype. Single nucleotide polymorphisms (SNPs) are single base differences between DNA sequences of individuals or strains. SNP markers represent the most common type of variation across a genome (Kwok 2001). Studies of SNP markers facilitate marker-aided selection (MAS), detect alleles associated with disease, analyze population history, and are used to produce genetic maps (Cardon and Bell 2001; Flint-Garcia et al. 2003; Goya et al. 2010). The SNP markers were identified in silico from expressed sequence tag (EST) or genomic sequences, or by Sanger Sequencing of candidate genes or PCR products (Gupta et al. 2008). The application for commercial SNP array has expanded from genotyping DNA to the detection and characterization of copy number variation and the loss-of heterozygosity (LaFramboise 2009). However, the rapid development and the reduced expense of next-generation sequencing technologies have provided unprecedented opportunities for researchers to find SNP markers, especially for non-model organisms, with greater resolutions and accuracy (LaFramboise 2009; Fonseca et al. 2016).

Transcriptome sequencing techniques (i.e., RNA-seq) can detect variants in the coding regions of the genome. RNA-seq provides large amount of sequencing data with reduced costs (Cloonan et al. 2008; Wilhelm et al. 2008). SNPs in noncoding regions may alter transcript levels by disrupting functional cis-regulatory elements, while those in coding regions of the genome may be silent or responsible for altered forms of proteins (Chepelev et al. 2009). RNA-seq is commonly employed to quantify gene expression levels under different conditions, and to detect alternative splicing, allele-specific expression, gene fusions and RNA editing (Wang et al. 2009; Eswaran et al. 2013; Piskol et al. 2013; Rapaport et al. 2013; Crowley et al. 2015; Conesa et al. 2016). Intrinsic complexities in transcriptomes (e.g., alternative splicing) lead to some challenges when using RNA-seq data to identify genomic variants at computational analysis steps. Several methods have been developed to call variants from RNA-seq data with the availability of complete genome information (Piskol et al. 2013; Quinn et al. 2013; Deelen et al. 2015). Studies suggested imposing strong variant filtering criteria, having sufficient coverage, using relevant tissue, imposing suitable quality control screens and having additional whole-exome sequencing (Cirulli et al. 2010; Seo et al. 2012) can increase the accuracy of variant identification using transcriptome data. Recently, RNA-seq has been employed to systematically identify variants in transcribed regions in different species, mostly in humans (Chepelev et al. 2009; Cirulli et al. 2010; Quinn et al. 2013; Kim et al. 2014) and in some plants (Paritosh et al. 2013; Shearman et al. 2015).

The marine red alga Porphyra umbilicalis Kützing is found from the Northeast Atlantic to the Northwest Atlantic, while its global distribution requires further review (Brodie et al. 2008). Sutherland et al. (2011) showed evidence from molecular data that the name of P. umbilicalis has been applied to more than one species. Porphyra umbilicalis is an important food source with high level of proteins, free amino acids, and a high ratio of ω3:ω6 fatty acids (Mouritsen 2013). It was demonstrated that P. umbilicalis can be used as a biofilter in integrated multi-trophic aquaculture (IMTA) and as a partial replacement for fishmeal (Day et al. 2009; Walker et al. 2009). Porphyra umbilicalis, which is native to northwest Atlantic, is an important component of intertidal communities and can be a good target to develop its domesticated strains for aquaculture and for components of IMTA (Blouin et al. 2007). The use of native strains of P. umbilicalis in the northwest Atlantic can potentially reduce the time, infrastructure, and costs associated with sexual spores (Blouin et al. 2007) and limit the potential ecological impact of introducing Pyropia species from Asia (Neefus et al. 2008).

Amplified fragment length polymorphism (AFLP; Blouin and Brawley 2012) and simple sequence repeat (SSR or microsatellite) markers (Eriksen et al. 2016) have been applied to analyze the genetic diversity and variation of the Northwest Atlantic populations of P. umbilicalis. However, the limited number of SSR markers used hinder the direct comparison of the microsatellite study with the AFLP assessment of P. umbilicalis genetic variation. Furthermore, there are large microbial communities associated with P. umbilicalis, living on the surface and inside the cell wall of the alga (Miranda et al. 2013). Bacterial DNA contamination of the Porphyra DNA pool may have generated more variable AFLP profiles. It is important to have more genetic markers that can be firmly tied to the Porphyra genome to analyze the genetic diversity and structure of Northwest Atlantic populations, and to aid the strain selection for integrated multi-trophic aquaculture.

In the Northeast Atlantic, the red alga P. umbilicalis can reproduce both sexually and asexually (Brodie and Irvine 2003); however, the Northwest Atlantic populations reproduce only asexually (Bird and McLachlan 1992; Blouin et al. 2007; Gantt et al. 2010). The Northwest Atlantic P. umbilicalis populations are found over a wide latitudinal range and occupy both rocky, open coastal habitats and estuarine tidal rapids (Eriksen et al. 2016). How these asexual populations adapt to different habitats is still unclear. According to Ingolfsson (1992), the Northwest Atlantic rocky coast species are extirpated by the previous glacial period, and the current populations are assumed to be the descendants of the Northeast Atlantic species that were introduced from glacial refugia via post-glacial trans-Atlantic currents. Elucidating genetic diversity and variation of P. umbilicalis in the NW Atlantic could assist in the further understanding of how P. umbilicalis colonized the Gulf of Maine after the last glacial maximum, and how asexual P. umbilicalis populations survive in different habitat environments, such as open coastal and estuarine tidal rapid habitats.

In this study, a computational pipeline was developed to identify SNP markers from RNA-seq data in P. umbilicalis. This pipeline accounts for the specific characteristics of RNA-seq experiments, the biological characteristics of P. umbilicalis (microbial contamination), and the absence of a genome assembly of P. umbilicalis at the time this study was carried out. Twenty-five of these SNP markers were validated and five of them were then used in a pilot study to examine genetic diversity and population structure of P. umbilicalis within the Gulf of Maine.

Material and methods

Species identification: gametophyte culture, RNA isolation, and sequencing

Four “individuals” (each defined as thalli connected to a single holdfast) were collected randomly along the rocky shore during low tide at Schoodic Point, ME (44° 20′ 11.3″ N 68° 03′ 23.3″ W). Because morphological identification of P. umbilicalis is error prone (Klein et al. 2003), the rbcL-rbcS intergenic spacer gene was sequenced for all individuals used in this study to confirm species identity (Teasdale et al. 2002). The individuals that were confirmed to be P. umbilicalis were put into culture.

Neutral spores (asexual spores) release was induced from the four wild P. umbilicalis blades from Schoodic Point, ME, USA, which were isolated, and cultured in the lab-controlled environment according to Redmond et al. (2014). Blade materials from each culture were pooled, and then each pool was subjected to one of the five different treatment conditions (Table 1). The total RNA was extracted and assessed for quality and quantity using a NanoDrop 2000c spectrophotometer (ThermoFisher Scientific) and an Agilent 2100 Bioanalyzer. The cDNA libraries were prepared by Illumina Truseq RNA Preparation Kit (Illumina, USA) and then sequenced on Illumina HiSeq 2000 at the Hubbard Center for Genome Studies in University of New Hampshire.

Table 1 Treatments applied to each pooled culture

Reference transcriptome construction

Five libraries of transcriptome sequences from P. umbilicalis were generated under five environmental treatments; these transcriptomes were used to build the reference transcriptome assembly. Reads were first error corrected by BLESS (Heo et al. 2014) with kmerlength 25 and then trimmed with Trimmomatic (Bolger et al. 2014) at Phred score of 2 (Macmanes 2014) to get rid of low-quality reads and adapters. Subsequently, Trinity (Grabherr et al. 2011) was applied for de novo assembly with digital normalization. Due to known bacterial contamination on the surface and inside the cell walls of P. umbilicalis (Miranda et al. 2013), the raw de novo assembly was screened by BLAST analysis against the genome of the red alga Chondrus crispus Stackhouse (Collén et al. 2013), as well as the P. umbilicalis EST reference (Chan et al. 2012) to eliminate potential bacterial contamination. Only contigs that were similar to either the C. crispus genome or the P. umbilicalis EST reference with E values higher than 1e−10 were retained, resulting in a contamination-free, partial transcriptome reference for P. umbilicalis. The quality of the contamination-free reference was checked by the number of “genes,” the number of “transcripts,” the GC content and the N50 value. The completeness of the contamination-free reference was checked by Core Eukaryotic Genes Mapping Approach (CEGMA, Parra et al. 2007).

SNP detection

Reads from all five libraries were first trimmed at a Phred score of 20 to eliminate low-quality reads and then mapped to the reference transcriptome assembly respectively by BWA (Li and Durbin 2010). Samtools (Li et al. 2009a) was used to filter alignments with mapping quality lower than 30. Picard (The Broad Institute; http://picard.sourceforge.net) was used to mark duplicates and sort. SNP calling was performed using the Genome Analysis Toolkit (GATK, McKenna et al. 2010) with Split’N’Trim. Indels and large structural variants were not analyzed. The raw putative SNPs were further filtered with sliding window of 35 and cluster of 3 based on the following criteria: (1) sequence depth at the SNP position ≥ 20; (2) FisherStrand (FS) ≤ 60; (3) RMSMappingQuality (MQ) ≥ 40; (4) MappingQualityRankSumTest (MQRankSum) ≥ − 12.5; (5) ReadPosRankSumTest (ReadPosRankSum) ≥ − 8.0. After filtering, SNPs that were common to all five libraries were considered P. umbilicalis putative population SNPs and used for further validation. The genes containing SNPs that were unique to an individual treatment were annotated using Blast2GO (Conesa et al. 2005).

SNP validation

Twenty-five putative SNPs identified by computational pipeline were randomly chosen for validation using five individuals collected from five sites: Fort Stark, NH; Dover Point, NH; Lubec, ME; Reid State Park, ME; Nubble Light, ME (Fig. 4). Among these putative 25 SNPs, “true SNPs” were SNPs that were confirmed by Sanger Sequencing to be polymorphic among five individuals. “True SNP rate” is used as a measurement of the success of using current pipeline to detect SNP markers from transcriptome data in P. umbilicalis. The adjacent sequence around target SNP was retrieved from RNA-seq data to design primers. Primer pairs targeting each putative SNP were designed with Primer 3 software (available at: http://primer3.ut.ee/). Polymerase chain reaction (PCR) conditions were optimized for each primer pair and then used to amplify various DNA templates for the targeted SNP regions. Amplification of each target region was performed in 25 μL reaction volumes containing about 25–125 ng genomic DNA, 1× Q5 One Taq Standard Reaction Buffer (New England BioLabs, USA), 200 μM dNTPs, 0.2 μM of each forward and reverse primer (Table 2), and 0.75 U One Taq Hot Start DNA Polymerase (New England BioLabs). PCR conditions started with an initial denaturation step of 94 °C for 3 min and was followed by 35 cycles of 30 s at 95 °C, 1 min at a primer-specific annealing temperature (Table 2), and 1 min extension at 72 °C. The amplification ended with a final extension at 72 °C for 5 min. The amplicon sizes were the same as predicted from the transcriptome sequence. The amplicons were purified using both QIAuick Gel extraction kit and ExoSAP-IT (Affymetrix.com) and then sent for Sanger Sequencing at GENEWIZ Company, USA. The Sanger sequencing results were aligned using MAFFT (http://mafft.cbrc.jp/alignment/server/) to detect SNPs. Trace files were utilized to further confirm SNPs found by MAFFT.

Table 2 Primers used to amplify regions containing SNP markers in P. umbilicalis

Genetic diversity and population structure

A total of five SNP markers (2, 8, 13, 18, and 22) that were validated by Sanger Sequencing were further used in the pilot population study. Genes containing these SNPs were annotated for their functions using Blast2GO (Conesa et al. 2005). A total of 37 individuals from seven sampling sites (Fig. 4) were assayed for these five SNP markers. Among these seven populations, there are five open coastal and two estuarine tidal-rapid populations, Dover Point NH and Wiscasset ME that are geographically close to open coastal populations (Fort Stark NH and Reid State Park ME), respectively. PCR amplification conditions were the same as described above.

The major allele frequency, gene diversity, polymorphism information content (PIC), and Nei’s genetic distances for each population were calculated using Power Marker 3.25 (Liu and Muse 2005). The SNP data was coded as follows: A = 1, C = 2, G = 3, T = 4 and missing data was coded as 0 as suggested in GenAlEx V6.5 user manual (Peakall and Smouse 2006, 2012). Analysis of molecular variance (AMOVA) and principal coordinate analysis (PCoA) were performed in GenAlEx v. 6.41 (Peakall and Smouse 2006, 2012). In addition, the Mantel test, with 9999 permutations, was conducted using the program GenALEx 6.5 for correlation between Nei’s genetic distance and geographic distance. Because neutral spores travel along currents, geographic distance was calculated according to Eriksen (2014).

Results

Reference transcriptome

Comparisons between the raw Trinity transcriptome and the transcriptome reference corrected for contamination are summarized in Table 3. There were 182,905 contigs in the raw reference directly from Trinity output, with GC content of 56.47% and N50 of 512 bp. There were many fewer contigs in the contamination-free transcriptome reference after clean-up step (42,802 reads), with higher N50 length (1090 bp) and higher GC contents (62.28%). However, according to CEGMA analysis, the transcriptome reference without contamination was 4% less complete than the raw Trinity transcriptome reference.

Table 3 Assembly result comparison between Trinity output and Trinity minus contamination output

SNP calling and functional annotation of library-specific SNP-containing genes

With stringent filtration, about 90% of the raw SNPs were filtered out. There were 549 putative SNPs in common to all five libraries (Fig. 1). The functional annotation containing unique putative SNPs in each library is showed in Fig. 2. There were less SNPs unique to the stress stimulus in library C (air dried, frozen, and then cultured for 2 weeks) compared to other libraries.

Fig. 1
figure 1

Venn diagram of putative SNPs in each treatment library. Each oval represents each library. A total of 549 putative SNPs were found in common to all five libraries. Unique putative SNPs found: A-185; B-354; C-144; D-467; E-510

Fig. 2
figure 2

Comparison of functional annotation of genes containing library-specific SNPs. Y-axis (%) means percentage of genes that have the corresponding functions illustrated

SNP validation

Primer pairs were designed for 25 gene transcripts containing putative SNPs. Target DNAs were amplified successfully for 13 of the 25 primer pairs. These primer pairs were used to amplify DNA from five individuals, each from one of the five different populations. The resulting amplicons were sent for Sanger Sequencing at GENEWIZ (http://www.genewiz.com/). Products of primer pair 4 did not contain the designed SNP, so this gene was dropped from further analysis. Amplicons from primer pairs 3 and 17 exhibited double peaks in the trace file from some individuals, even when sequenced from both ends. Among the remaining 10 primers pairs, seven were polymorphic based on screening five individual P. umbilicalis samples (SNP 2, 8, 13, 14, 15, 18, and 22). Two of the primer pairs (SNP 10 and SNP 16) produced monomorphic amplicons for the five P. umbilicalis individuals tested. SNP 19 was monomorphic in the predicted SNP position but was polymorphic at another position. The true SNP detection rate was likely higher than 70% since only five individuals were used to validate each putative SNP.

Pilot population study

Five random SNP markers from the seven validated polymorphic SNP loci were used for the population genetic study. The functions of these five amplicons were 6-phosphogluconate dehydrogenase decarboxylating (SNP 8), mRNA export factor/elongation factor (SNP 13), translocation protein 3Ec63 homolog (SNP 18), and unknown (SNP 2 and 22). There were more SNPs in the sequenced regions besides those inferred by bioinformatics of RNA-seq libraries and an additional eight SNPs were identified in comparing two or more amplicons from the same primers. Thus, 13 polymorphic SNPs were characterized in this study. The gene diversity and Polymorphic Information Content (PIC) value for each population are shown in Table 4. The gene diversity ranged from 0 to 0.16 and the PIC value ranged from 0 to 0.131. Dover Point, Nubble Light, Reid Park, and Wiscasset had the lowest genetic diversity and PIC values. Fort Stark and Lubec populations had intermediate levels of genetic diversity. The highest genetic diversity occurred from populations collected from Schoodic Point.

Table 4 Genetic diversity of seven P. umbilicalis populations

Population structure

Genetic distance and genetic differentiation across the sampling sites are summarized in Table 5. The genetic distances were highest between Reid State Park and the rest of the populations, ranging from 0.171 to 0.308. Based on Mantel test, there was no evidence of isolation by distance between geographic distance and genetic distance (Rxy = − 0.344, p = 0.613). Based on genetic distance, PCoA clustered three open coast populations (Fort Stark, Lubec, and Schoodic Point) into a central group, and Nubble Light and Dover Point into another group (Fig. 4). Reid State Park and Wiscasset were both on the right side of the PCoA but separated by PCoA2 (Fig. 3). AMOVA showed that there was more variation among populations (59%) than within populations (41%) indicating some level of colonality within populations.

Table 5 Genetic distance (below diagonal) and genetic differentiation (above diagonal) among seven sample sites of Porphyra within the Gulf of Maine
Fig. 3
figure 3

Principal coordinate analysis (PCoA) based on genetic distance among seven populations. Dover Point and Nubble Light populations mapped to the same PCoA coordinate. Percentage of the total variation explained by each axes was shown

Figure 4 shows the frequency of genotypes in each population. Besides in Schoodic Point (genotype 4–genotype 9), two other unique genotypes were found in Wiscasset (genotype 3) and Reid State Park (genotype 1). A common genotype (G2) was found in all populations except Reid State Park NH and Wiscasset ME. Schoodic Point ME had the highest genotypically diversity, followed by Lubec ME and Fort Stark NH.

Fig. 4
figure 4

Pie graph showing genotype frequency in each population. Each color represented a different genotype. The number represented the name of the genotype. NH stands for New Hampshire and ME stands for Maine

Discussion

SNP discover and validation

Several bioinformatic pipelines have been developed to detect SNPs using genome information; these methods were reviewed by Pabinger et al. (2014). However, most of the existing pipelines were designed with the availability of a good reference genome and thus were species-specific, such as SOAPsnp (Li et al. 2009b), SNPdetector (Zhang et al. 2005), and SNPiR (Piskol et al. 2013). Several other pipelines have been successfully applied to both model and non-model species using only transcriptome data without a genome reference (Li and Godzik 2006; Van Belleghem et al. 2012; Piskol et al. 2013; Romiguier et al. 2014; Lopez-Maestre et al. 2016). In green algae, SNP markers have been identified using transcriptome data only (Li et al. 2014). This study benchmarks the detection of SNP markers from P. umbilicalis using only transcriptome dataset.

The computational pipeline we used accounted for the lack of a reference genome for P. umbilicalis at the time the research was conducted, RNA-Seq experiment intrinsic characteristics, and the extensive microbial contamination of P. umbilicalis tissues (Miranda et al. 2013). Filtering the transcriptome assemblies against the Porphyra EST database (Chan et al. 2012) and C. crispus genome (Collén et al. 2013) eliminated microbial contamination and ensured the SNP markers found came from Porphyra. After filtering, the contamination-free transcriptome reference assembly was estimated to be slightly less complete (83 versus 87%) than the original Trinity reference transcriptome.

We found a total of 549 putative SNPs in common to five Porphyra RNA-Seq libraries. SNP validation result showed that at least 70% of the SNPs identified by the bioinformatic pipeline were true SNPs. The true SNP detection rate was likely higher than 70% since only five individuals from different algal populations were screened for polymorphism for each amplicon. Increasing the number of individuals or increasing the number of sampling sites will likely increase the true SNP detection rate. These 549 SNPs are good candidate markers for further genetic diversity and population structure analysis, especially to study the evolutionary history of how mutations accumulate in asexual Porphyra populations in the Northwest Atlantic.

Library C went through air-drying and freezing stress conditions, which might be expected to cause more stress-related genes to be expressed. However, there were less unique SNPs detected responding to stimulus (stress related) in library C compared to other libraries. Based on annotation results, we suspect that some library-specific SNPs were identified due to differences in transcript coverage in each library rather than due to differential gene expression. We showed that RNA-seq data is suitable to find SNPs without bias towards specific treatments. We also identified two SNPs (3 and 17) that had high background noise in the sequencing files. This high background noise could be the result of polymorphisms from recent gene duplications; this is a drawback to not having a reference genome with information about underlying gene family structure.

Population genetics analysis

A total of 13 polymorphic SNP sites were used to investigate the genetic diversity and population structure of P. umbilicalis from seven populations within the Gulf of Maine. A total of 11 genotypes were found in our study. Genotype 2 was present in a wide range of environmental conditions from the northern border of Maine to New Hampshire, as well as the estuarine environment at Dover Point. A “general purpose genotype” (Baker 1965) that can confer broad environmental tolerance will rapidly increase its frequency in a population without the selection of a locally adapted variant. Genotype 2 in our study is likely a “general purpose genotype.” However, this “general purpose genotype” may not be able to deal with all changing environments (Selander and Hudson 1976) as genotype 2 was not detected in Wiscasset ME and Reid State Park ME, which have unique environmental characteristics.

The genetic diversity of asexual populations depends on the number of possible mutations that occur over time in the populations, and the proportion of these mutations that persist through time (Good et al. 2012). In general, postglacial recolonization from refugia is related to the genetic diversity of the rocky shore fauna and marine macroalgae in the Northwest Atlantic (Ingolfsson 1992; Teasdale and Klein 2010). The population from Schoodic Point had the highest genotypic diversity, with each individual examined having a unique genotype; this population also had the highest genetic diversity. Schoodic Point is the north most population among the seven populations, which is very likely to be the somewhat older than more southerly populations in New England (Teasdale and Klein 2010). However, the high genetic diversity observed for the Schoodic Point could also suggested there may be population ascertainment bias as the SNPs were originally discovered using cultures established from multiple individuals from Schoodic Point. This would explain why the Schoodic Point population had the highest genetic diversity and is most genotypically diverse. The populations with the low (or absence of) allelic diversity (Dover Point, Nubble Light, Wiscasset, and Reid State Park) are all marginal populations. Marginal populations were showed with decreased genetic and allelic diversity in brown alga Laminaria digitata (Oppliger et al. 2014). Also, Dover Point, Wiscasset, and Reid State Park populations inhabit somewhat atypical environments for P. umbilicalis (estuarine tidal rapids, or sandy substrata). Selection for these atypical environments may further reduce the genetic diversity in these populations. The low genetic diversity would limit the ability of populations to adapt to the changing environments and thus impact their long-term survival potential, especially under stressful conditions (Markert et al. 2010).

In addition to seven unique genotypes found within the Schoodic Point population, there were two other novel genotypes. One was from the estuarine population at Wiscasset ME and the other was from the nearby coastal population at Reid State Park ME. The unique genotype from Wiscasset (genotype 3) was only one locus different from genotype 2 and may be the result of genetic drift. Another possible explanation is that the low salinity in the estuarine environment induces the unique genotype as suggested by Ram and Hadany (2014); stress-induced mutagenesis can help generate a better adaptive genotype in an extreme environment. The other unique genotype (genotype 1) was only present in Reid State Park population. This genotype differs at three loci from genotype 2. These three loci were not in the same gene but all reach fixation based on our limited sampling in the population (five individuals). Besides the possibility of genetic drift, it is also likely that these three fixed mutations occurred in succession as three hard selective sweeps events or one clone under the effect of clonal interference. As Lang et al. (2011) suggested that clonal interference is far more likely to happen to any mutation compared to selective sweeps in asexual population. It is very likely that we missed other clones in the Reid State Park population. The genetic distance and differentiation also showed that significantly high genetic differentiation existed between Reid State Park and the rest of the populations. It is not clear why Reid State Park has such a high genetic differentiation to the rest populations. Bottlenecks or extirpation events in the past followed by subsequent recolonization could lead to high genetic differentiation. Also, it can be caused by the unique environment in Reid State Park, as the sampling site in Reid State Park has more sandy sediment nearby. However, larger sample sizes from Reid State Park are needed to draw any conclusion for the high genetic variation in Reid State Park.

The previous study of genetic diversity of these same populations (Eriksen et al. 2016) looked at a much larger number of individuals (221) using three polymorphic SSR markers. A total of six genotypes were identified by SSR, compared to the 11 genotypes found in this study that sampled 37 individuals. One explanation for the lower level of polymorphisms identified by SSR loci is that SSR markers used by Eriksen et al. (2016) were developed from protein coding regions. Expansion and contraction of SSR regions in the protein coding regions may be more functionally constrained than SNPs in the same region. Different genotype patterns were also observed by Eriksen et al. (2016); they reported two or more SSR genotypes in each population, with the largest number of genotypes (4) were observed for the Fort Stark NH population. By contrast, the present study found four populations had single genotypes (Reid State Park ME, Wiscasset ME, Nubble Light ME, and Dover Point NH) and Schoodic Point had the highest number of genotypes. Different patterns of genetic diversity reported by SNP markers microsatellites markers were also reported in red alga Chondrus crispus (Provan et al. 2013). However, we found much less amounts of genotypes in our study comparing to those in Blouin and Brawley (2012), which showed 41 clones in 51 individuals from 2 populations in ME. The difference between these two studies were probably due to that AFLP DNA fingerprinting was influenced by non-target DNA caused by cryptic contaminants.

With predominant asexual reproduction of P. umbilicalis in the Northwest Atlantic (Blouin and Brawley 2012), genetic differentiation among Porphyra populations is restricted by the dispersal ability of neutral spores. There is no report on how far Porphyra neutral spores can travel. In natural environment with water constantly stirred, spores tend to stay near the surface. The rate of spore sinking is low, irrespective of spore size in other algae species (Hoffmann and Camus 1989). While under lab conditions, Porphyra spores tend to settle down and attach to substrate within 12–24 h of release in still water (unpub. data). In general, the genetic differentiation among seven populations reported was much higher than that reported by EST-SSR markers (Eriksen et al. 2016), which agrees with the finding in red alga Furcellaria lumbricalis that showed the level of genetic differentiation reported by SNP markers was higher than that reported by EST-derived neutral microsatellites markers (Olsson and Korpelainen 2013). The genetic differentiation among the three geographically close southern populations (Fort Stark, Nubble Light, and Dover Point) was low and insignificant, possibly because these three populations are within the dispersal range of neutral spores (< 30 km apart). It is possible that the genetic differentiation among these populations is mainly driven by migration. The high genetic differentiation found among the rest populations (Fort Stark, Nubble Light, and Dover Point) was consistent with the finding in red alga Corallina officinalis by SNP markers.

We found that genetic variation was slightly higher among populations than within populations based on AMOVA results, suggesting that genetic differentiation may be caused by genetic drift or selection (Excoffier et al. 1992). Genetic drift and selection are more likely to happen to small relatively isolated estuarine populations like Wiscasset ME and Dover Point NH or populations in atypical environments like Reid State Park ME. Although Reid State Park is geographically closest to the estuarine population Wiscasset, it was genetically most distant to Wiscasset. Genetic drift and selection for specific environments (estuarine environment for Wiscasset and more sandy sediment for Reid State Park) may play an important role in the high genetic differentiation between these two nearby populations (27 km apart).

It is worth mentioning that observed genetic structure and genotype variation for P. umbilicalis may also vary year-round and between years. Drenth et al. (1994) showed that the genetic structure of asexual fungal populations was different every year as less than 10% of the genotypes survived every year. Selection has the ability to change the genetic structure of small populations rapidly (Worrall 2012) and selection can vary considerably from year to year within a population (Price et al. 1984, Milner et al. 1999). In P. umbilicalis as population size drops in the summer due to thermal and UV stress; random genetic drift will increase as the power of random genetic drift is inversely proportional to the effective population size (Lynch et al. 2016). Environmental impacts on genetic diversity suggest that it is better to collect samples at the same season in order to accurately compare genetic diversity and population structure. The samples used in our study were originally collected by Eriksen et al. (2016), at different times of the year and thus our inferences about population structure should be viewed with caution. In future studies, inclusion of a larger number of markers and more samples per population should yield a better estimate of genetic diversity and population structure of P. umbilicalis within the Gulf of Maine.