CEDAR Data Model

Version: 1.0 Last Updated: January 2026

Overview

CEDAR uses a normalized data model designed for enrollment analytics in higher education. This model is institution-agnostic - while Cedar was built using MyReports data from UNM, any institution can map their data sources to these tables.

Why a Standardized Model?

Instead of working directly with vendor-specific report formats (MyReports, Banner, Canvas, etc.), CEDAR defines its own data schema with only the columns needed for analytics. This approach:

  • Reduces memory usage by 60-70% (loads only needed columns)
  • Speeds up startup from 10-15 seconds to 3-5 seconds
  • Simplifies code - all analytics reference consistent column names
  • Enables portability - institutions can map their data without changing CEDAR code
  • Improves maintainability - vendor changes don’t break your analytics

Core Tables

CEDAR requires 5 core tables. Each table is described below with:

  • Purpose: What this table represents
  • Key columns: Required fields
  • Optional columns: Helpful but not required
  • Relationships: How it connects to other tables

1. cedar_sections (Course Offerings)

Purpose: One row per course section per term (e.g., MATH 1350-001 in Fall 2025)

Required Columns

These columns are used throughout Cedar code - features will break without them

Column Type Description Example Usage Count
term integer Academic term code 202580 Core filter
crn string Course Reference Number “12345” Section identifier
subject string Subject code “MATH” Subject filter
course_number string Course number “1350” Course identifier
subject_course string Combined subject + course “MATH 1350” 239 references!
section string Section number “001” Section filter
course_title string Course title “Calculus I” Display
campus string Campus code “Main”, “ABQ”, “Online” Filter (34 uses)
college string College code “AS” (Arts & Sciences) Filter (30 uses)
department string Department code “MATH” Filter (58 uses)
instructor_id string Primary instructor ID “123456” FK to faculty
instructor_name string Instructor full name “Smith, John” Instructor filter
enrolled integer Current enrollment 28 Analytics (11 uses)
capacity integer Maximum enrollment 30 Seat analysis
status string Section status “A” (Active), “C” (Cancelled) Active filter
delivery_method string Delivery mode “F2F”, “Online”, “Hybrid” Method filter
level string Course level “lower”, “upper”, “grad” Level filter (75 uses!)
term_type string Term type “fall”, “spring”, “summer” Forecasting (82 uses!)
part_term string Part of term “1H”, “2H”, “FT” Seatfinder analysis
gen_ed_area integer Gen Ed category code 1, 2, 3, 4, 5, 7 Gen Ed filter (19 uses)
is_lab boolean Lab section flag TRUE/FALSE Lab identification
as_of_date date When data was extracted “2025-01-10” Data freshness

Important Notes:

  • subject_course is created by combining subject + course_number (used 239 times!)
  • level, term_type, gen_ed_area, is_lab are derived during parsing but heavily used
  • instructor_name can be derived from first/last name fields if you have those

Optional Columns

Nice to have but not required for core functionality

Column Type Description Example
waitlist_count integer Students on waitlist 5
waitlist_capacity integer Max waitlist size 10
start_date date Section start date “2025-08-20”
end_date date Section end date “2025-12-15”
credits_min numeric Minimum credits 3.0
credits_max numeric Maximum credits 3.0
crosslist_primary boolean Is primary crosslist section TRUE/FALSE
crosslist_group string Crosslist group ID “XL-12345”
room string Room number “MESA 101”
building string Building code “MESA”
days string Meeting days “MWF”
times string Meeting times “10:00AM-10:50AM”

Computed Columns (for reference)

These are derived in parsers from other columns. Document how to compute them.

Column How Computed From
level Based on course_number <300=”lower”, 300-499=”upper”, 500-699=”grad”, ≥1000=”lower”
is_lab Check for letter suffix grepl("[A-Z]$", course_number)
term_type From term code last 2 digits 10=”spring”, 60=”summer”, 80=”fall”
gen_ed_area Map course to category Check if subject_course in gen_ed lists
instructor_name Combine name fields paste(last_name, first_name, sep=", ")

2. cedar_students (Student Registrations)

Purpose: One row per student per course section (class lists)

Required Columns

These columns are used throughout Cedar code - features will break without them

Column Type Description Example Usage Count
enrollment_id string Unique identifier Auto-increment or hash -
section_id string FK to cedar_sections “202580-12345” Join key
student_id string Encrypted student ID Hash of real ID 19 uses
term integer Academic term code 202580 113 uses!
subject_course string Course (denormalized) “MATH 1350” Pathway analysis
course_title string Course title “Topics in History” Display, distinguish same-number courses
campus string Course campus “Main”, “ABQ” 72 uses!
college string Course college “AS” 45 uses!
department string Course department “MATH” Filtering
registration_status string Enrollment status “Registered”, “Dropped” 3 uses
registration_status_code string Status code “RE”, “RS”, “DR”, “W” 14 uses
final_grade string Final grade (if term complete) “A”, “B+”, “W”, “I” 40 uses!
student_level string Student level “UG”, “GR” 19 uses
student_classification string Class standing “FR”, “SO”, “JR”, “SR” 3 uses
major string Student’s major code “MATH-BS” Headcount
student_college string Student’s college code “AS” 4 uses
student_campus string Student’s campus “Main” 9 uses
term_type string Term type (denormalized) “fall”, “spring” Rollcall analysis
as_of_date date When data was extracted “2025-01-10” Data freshness

Important Notes:

  • student_id must be encrypted/hashed - never store plaintext student IDs!
  • term is most-used column (113 references) - absolutely critical
  • campus (72 uses) and college (45 uses) are heavily filtered
  • Some columns like subject_course, campus, college, term_type are denormalized from sections for query performance

Optional Columns

Enhance functionality but not strictly required

Column Type Description Example
credits numeric Credits student is taking 3.0
registration_date date When student registered “2025-04-15”
drop_date date When student dropped “2025-09-01”
residency string In-state/out-of-state “Resident”
dual_credit boolean Dual credit student TRUE/FALSE

Important Notes

  • Privacy: student_id must be encrypted/hashed to protect student privacy
  • Relationship: Links to cedar_sections via section_id
  • Size: This is typically the largest table (millions of rows)

3. cedar_programs (Student Academic Programs)

Purpose: Student major, minor, concentration enrollment by term

Required Columns

Column Type Description Example
program_id string Unique identifier Auto-increment
student_id string Encrypted student ID Hash
term integer Academic term 202580
program_type string Type of program “Major”, “Minor”, “Concentration”
program_name string Program full name “Mathematics BS”
college string College offering program “AS”
department string Department name from MyReports “AS Anthropology”, “Physics Astronomy”
student_level string Student academic level “UG”, “GR”
student_college string Student’s home college “AS”
student_campus string Student’s campus “Main”
as_of_date date When data was extracted “2025-01-10”

Optional Columns

Column Type Description Example
degree string Degree type “BS”, “BA”, “MS”, “PhD”
classification string Program classification “Undergraduate”, “Graduate”
catalog_year integer Catalog student follows 202580
program_status string Active, graduated, etc. “Active”
declared_date date When program was declared “2023-09-01”

4. cedar_degrees (Graduates)

Purpose: Awarded degrees and pending graduates

Required Columns

Column Type Description Example
degree_id string Unique identifier Auto-increment
student_id string Encrypted student ID Hash
term integer Graduation term 202510
degree string Degree awarded “BS”, “BA”, “MS”, “PhD”
program_name string Program name “Mathematics BS”
college string College “AS”
department string Department “MATH”
graduation_status string Status “Conferred”, “Pending”, “Applied”
as_of_date date When data was extracted “2025-01-10”

Optional Columns

Column Type Description Example
campus string Campus “Main”
major string Major name “Mathematics”
second_major string Second major (if applicable) “Physics”
minor string Minor “Computer Science”
cumulative_gpa numeric Final GPA 3.67
cumulative_credits numeric Total credits earned 128
honors string Graduation honors “Summa Cum Laude”, “Cum Laude”
admitted_term integer When student first enrolled 202180

5. cedar_faculty (Faculty HR Data)

Purpose: Faculty appointment and job category data for instructor analysis, particularly student-faculty ratios and DFW analysis by instructor type.

Source: Transformed from HR reports via transform-hr-to-cedar.R

Used by: sfr.R (student-faculty ratios), gradebook.R (DFW analysis by instructor type)

Required Columns

Column Type Description Example Usage Count
instructor_id string Encrypted UNM ID (matches cedar_students) “abc123…” Join key
instructor_name string Full name “Smith, John D.” Display
term integer Academic term 202580 Join key (34 uses)
department string Home department code “MATH” Filter (28 uses)
job_category string Employment category “Professor”, “Lecturer”, “Term Teacher” Analytics (15 uses)
appointment_pct numeric Appointment percentage as decimal 1.0 (100%), 0.5 (50%) FTE calculation
as_of_date date When HR data was processed “2025-01-10” Data freshness

Important Notes:

  • department uses department codes from HR data (e.g., “MATH”, “ANTH”)
  • job_category uses Title Case values from parse-HRreport.R (see values below)
  • appointment_pct is stored as decimal 0.0-1.0, not percentage (e.g., 0.5 = 50%)
  • instructor_id is encrypted to match the encryption used in cedar_sections

Job Categories

The job_category field uses values from parse-HRreport.R:

Category Description Counted in SFR?
Professor Full professor ✅ Yes (permanent)
Associate Professor Associate professor ✅ Yes (permanent)
Assistant Professor Assistant professor ✅ Yes (permanent)
Lecturer Lecturer (non-tenure track) ✅ Yes (permanent)
Term Teacher Term teacher ❌ No (temporary)
TPT Temporary part-time ❌ No (temporary)
Grad Graduate assistant ❌ No (temporary)
Professor Emeritus Emeritus professor ❌ No (non-active)

For SFR calculations, only permanent faculty (Professor, Associate Professor, Assistant Professor, Lecturer) are counted as part of the faculty FTE denominator.

Optional Columns (Retained for Reference)

Column Type Description Example
academic_title string Original academic title from HR “Assistant Professor”
job_title string Job title from HR “Assistant Professor”
college string Home organization description from HR “AS Mathematics & Statistics”

Transformation Details

The cedar_faculty table is created by transform-to-cedar.R which:

  1. Loads hr_data.Rds (output from parse-HRreport.R)
  2. Normalizes column names to CEDAR conventions:
    • UNM IDinstructor_id (encrypted)
    • term_codeterm (integer)
    • DEPTdepartment (department code)
    • job_catjob_category (Title Case values)
    • Appt %appointment_pct (decimal 0.0-1.0)
    • Home Organization Desccollege
  3. Saves to cedar_faculty.Rds

Example Queries

Calculate permanent faculty FTE by department:

cedar_faculty %>%
  filter(job_category %in% c("professor", "associate_professor",
                              "assistant_professor", "lecturer")) %>%
  group_by(term, department) %>%
  summarize(fte = sum(appointment_pct))

Count faculty by job category:

cedar_faculty %>%
  filter(term == 202580) %>%
  count(job_category, sort = TRUE)

Merge with grade data for DFW analysis by instructor type:

# Used in gradebook.R
grade_counts %>%
  left_join(cedar_faculty, by = c("instructor_id", "term")) %>%
  group_by(subject_course, job_category) %>%
  summarize(dfw_pct = sum(failed) / sum(passed + failed))

Relationships Between Tables

cedar_sections
    ├─► cedar_students (via section_id)
    │       └─► cedar_programs (via student_id)
    │       └─► cedar_degrees (via student_id)
    └─► cedar_faculty (via instructor_id)

Key Foreign Key Relationships

  1. sections → enrollments: cedar_sections.section_id = cedar_students.section_id
  2. sections → faculty: cedar_sections.instructor_id = cedar_faculty.instructor_id
  3. enrollments → programs: cedar_students.student_id = cedar_programs.student_id
  4. enrollments → degrees: cedar_students.student_id = cedar_degrees.student_id

Data Size Expectations

Typical data volumes for a mid-sized university (~25,000 students):

Table Rows per Term Total (5 years) Memory (approx)
cedar_sections 5,000 - 8,000 40,000 - 60,000 5-10 MB
cedar_students 100,000 - 150,000 750,000 - 1M 60-100 MB
cedar_programs 30,000 - 40,000 200,000 - 300,000 15-25 MB
cedar_degrees 5,000 - 8,000 40,000 - 60,000 3-5 MB
cedar_faculty 1,500 - 2,500 10,000 - 15,000 1-2 MB
Total     ~100 MB

Compare to current MyReports format: ~300MB for same data!


Common Queries

Get enrollment by department

cedar_sections %>%
  filter(term == 202580, status == "A") %>%
  group_by(department) %>%
  summarize(total_enrollment = sum(enrolled))

Get student’s course history

cedar_students %>%
  left_join(cedar_sections, by = "section_id") %>%
  filter(student_id == "hashed_id") %>%
  select(term, subject, course_number, section, final_grade)

Count majors by program

cedar_programs %>%
  filter(term == 202580, program_type == "Major") %>%
  group_by(program_name) %>%
  summarize(headcount = n_distinct(student_id))

DFW rates by course

cedar_students %>%
  left_join(cedar_sections, by = "section_id") %>%
  filter(term >= 202080, final_grade %in% c("D", "F", "W")) %>%
  group_by(subject, course_number) %>%
  summarize(
    dfw_count = n(),
    total_count = n_distinct(enrollment_id)
  ) %>%
  mutate(dfw_rate = dfw_count / total_count)

Column Naming Conventions

CEDAR uses snake_case for all column names to ensure consistency:

  • student_id, course_title, enrollment_date
  • StudentID, CourseTitle, Enrollment Date (avoid)

Standard Abbreviations

Abbreviation Meaning
id Identifier (primary key or foreign key)
crn Course Reference Number
term Academic term code (YYYYCC format)
dept Department
pct Percentage
enrl / enrolled Enrollment
max Maximum
min Minimum
avg Average

Data Types

Follow these conventions for consistency across institutions:

Type R type Description Example
Identifiers character Always string, even if numeric “12345”, not 12345
Term codes integer 6-digit term code 202580 (Fall 2025)
Counts integer Whole numbers 28, 150
Percentages numeric Decimals 0-1 0.93 (not 93)
Dates Date Standard date type “2025-08-20”
Flags logical TRUE/FALSE TRUE, not “Y”

Privacy & Security

Student ID Encryption

CRITICAL: Never store plaintext student IDs. Always encrypt/hash before saving to CEDAR tables.

# Example encryption (use stronger method in production)
library(digest)
encrypt_student_id <- function(id) {
  digest(paste0(id, Sys.getenv("CEDAR_SALT")), algo = "sha256")
}

# In transformation
cedar_students <- class_lists %>%
  mutate(student_id = encrypt_student_id(`Student ID`))

Sensitive Columns

Mark these as optional or exclude entirely based on your institution’s policies:

  • Email addresses
  • Student addresses
  • Social Security Numbers (NEVER include)
  • Detailed demographic data beyond aggregated reporting needs

Next Steps

  1. Read the transformation guide: See data-transformation-myreports.md for how to map MyReports → CEDAR
  2. Review sample data: Check data/samples/ for example CEDAR tables
  3. Run validation: Use validate_cedar_data() function to check your tables
  4. Start using: Load CEDAR tables instead of raw vendor data in your analytics

Questions?

  • How do I map my data? See institution-specific transformation guides in docs/transformations/
  • What if I don’t have a column? Many columns are optional - provide what you have
  • Can I add custom columns? Yes! Add institution-specific columns as needed
  • How do I validate? Run source("R/data-validation.R"); validate_cedar_tables()

Back to top

CEDAR is open source software for higher education analytics.