02/04/16 Firefly dataset discussion

On 02/04/16 Thursday afternoon, Andrew, Logan, Boha and I (Saisi) got together in the conference room and talked our firefly dataset (https://figshare.com/articles/LTER_Lampyrid_data/2068098) for the first time. The discussion was productive, and we obtained: 1. the metadata  and DRP (Data Reuse Plan) information of the dataset. 2.Hypothesis on data analysis and paper drafting, and questions for Christie on how we can dig into the data deeper.

1. Data Reuse Plan Worksheet and Metadata

Variate Description Units
Sample Date sample date
Treatment treatment ID
Replicate replicate within the treatment
Station sampling station within each plot
Species scientific name of the species in the trap
Family family that the species belongs to
Order order that the species belongs to
Adults number of adults of that species that was in the trap at sampling time number
location utm location of the trap in utm zone 16N meter
Year sample year


Project description (abstract): Firefly numbers in Kellogg observation station from 2004-2015
Data set title (e.g. “Data from: ”, “Soil moisture data in Columbia Delta 1982”): LTER_lampyrid_data_20042015
Permanent ID (PID types include: DOI, PURL, ARK, handl, etc.): Unknown
Sources of data (if someone else’s data is included in your data set; preferably use a permanent identifier if available): KBS longer term ecological research site
Subject area (e.g. Neurological biochemistry, applied ecology, etc.): Entomology
Related research publication (include full citation and permanent identifier, if available): Christie’s Ladybug publicationhttp://link.springer.com/article/10.1007%2Fs10530-014-0772-4


Person/organization responsible for collecting data: Christie and her colleagues
Sponsoring or funding agency, grant number, and PI name/s & affiliations: GLBRC?
Collaborators (if applicable): ?

Contact person, their affiliation and contact info for questions about the data: Christie


Location where data was collected (use geographic coordinates if appropriate): KBS LTER Main Site http://lter.kbs.msu.edu/maps/images/current-lter-plot-map.pdf
Place of publication (e.g. institution or repository where data is made available): PeerJ


Dates of collection (specific date, date range): 2004-2015
Date of publication (when data was made publicly available): By the end of 2016 spring


Data collection process (what instruments were used to collect the data? how frequently were the data collected? how were data collection sites selected? if there was a sample population, how was it selected?): Sticky tags in KBS main station

Data processing description (how did you clean the data? how are null values handled? did you write code for processing the data and where can it be found?)


File format (are there multiple formats? what software is needed to use the file/s?) NB: Avoid proprietary formats if possible!  cvs file
File structure (if more than one file in dataset; include folder and file index, naming conventions, README files; provides context): 
Survey instruments (if any, include permanent identifier that points to the instrument if not included in files): Sticky tags
Field names and definitions (include units of measurement, formulas used for calculation, explain abbreviations): KBS LTER

2. Hypothesis and questions:

Preliminary test: By sorting the data and plotting firefly population with different variables (Logan is a quick plotter!), we find correlation between population with year, tilling and landscape.
Therefore, we had the hypothesis that population of fireflies is dependent on hab type + year + organic/not +  (weather + temperature). And to decouple the effects of diffrent factors, we may take snapshot of each year to study habitat type and treatment (organic, tilling, etc.,).
Questions remain: Do we need a model of population (dependent variable) depends on year and habitat types (two most significant independent variable)? If yes, is there an empirical model we can get from previous papers? What parameters should we estimate? How many variables should we consider? If we are not going to develop a mathematic model, what statistical analysis should we run?

