您好,欢迎光临本网站![请登录][注册会员]  
文件名称: Cody‘s_Data_Cleaning_Techniques_Using_SAS_(Second_Edtion)
  所属分类: 讲义
  开发工具:
  文件大小: 925kb
  下载次数: 0
  上传时间: 2018-09-18
  提 供 者: yanghe******
 详细说明: Table of Contents List of Programs ix Preface xv Acknowledgments xvii Checking Values of Character Variables Introduction 1 Using PROC FREQ to List Values 1 Description of the Raw Data File PATIENTS.TXT 2 Using a DATA Step to Check for Invalid Values 7 Describing the VERIFY, TRIM, MISSING, and NOTDIGIT Functions 9 Using PROC PRINT with a WHERE Statement to List Invalid Values 13 Using Formats to Check for Invalid Values 15 Using Informats to Remove Invalid Values 18 Che Checking Values of Numeric Variables Introduction 23 Using PROC MEANS, PROC TABULATE, and PROC UNIVARIATE to Look for Outliers 24 Using an ODS SELECT Statement to List Extreme Values 34 Using PROC UNIVARIATE Options to List More Extreme Observations 35 Using PROC UNIVARIATE to Look for Highest and Lowest Values by Percentage 37 Using PROC RANK to Look for Highest and Lowest Values by Percentage 43 Presenting a Program to List the Highest and Lowest Ten Values 47 Presenting a Macro to List the Highest and Lowest "n" Values 50 Using PROC PRINT with a WHERE Statement to List Invalid Data Values 52 Using a DATA Step to Check for Out-of-Range Values 54 Identifying Invalid Values versus Missing Values 55 1 2 iv Table of Contents Listing Invalid (Character) Values in the Error Report 57 Creating a Macro for Range Checking 60 Checking Ranges for Several Variables 62 Using Formats to Check for Invalid Values 66 Using Informats to Filter Invalid Values 68 Checking a Range Using an Algorithm Based on Standard Deviation 71 Detecting Outliers Based on a Trimmed Mean and Standard Deviation 73 Presenting a Macro Based on Trimmed Statistics 76 Using the TRIM Option of PROC UNIVARIATE and ODS to Compute Trimmed Statistics 80 Checking a Range Based on the Interquartile Range 86 Checking for Missing Values Introduction 91 Inspecting the SAS Log 91 Using PROC MEANS and PROC FREQ to Count Missing Values 93 Using DATA Step Approaches to Identify and Count Missing Values 96 Searching for a Specific Numeric Value 100 Creating a Macro to Search for Specific Numeric Values 102 Working with Dates Introduction 105 Checking Ranges for Dates (Using a DATA Step) 106 Checking Ranges for Dates (Using PROC PRINT) 107 Checking for Invalid Dates 108 Working with Dates in Nonstandard Form 111 Creating a SAS Date When the Day of the Month Is Missing 113 Suspending Error Checking for Known Invalid Dates 114 4 3 Table of Contents v Loo Looking for Duplicates and "n" Observations per Subject Introduction 117 Eliminating Duplicates by Using PROC SORT 117 Detecting Duplicates by Using DATA Step Approaches 123 Using PROC FREQ to Detect Duplicate ID's 126 Selecting Patients with Duplicate Observations by Using a Macro List and SQL 129 Identifying Subjects with "n" Observations Each (DATA Step Approach) 130 Identifying Subjects with "n" Observations Each (Using PROC FREQ) 132 Wor Working with Multiple Files Introduction 135 Checking for an ID in Each of Two Files 135 Checking for an ID in Each of "n" Files 138 A Macro for ID Checking 140 More Complicated Multi-File Rules 143 Checking That the Dates Are in the Proper Order 147 Double Entry and Verification (PROC COMPARE) Introduction 149 Conducting a Simple Comparison of Two Data Sets 150 Using PROC COMPARE with Two Data Sets That Have an Unequal Number of Observations 159 Comparing Two Data Sets When Some Variables Are Not in Both Data Sets 161 Som Some PROC SQL Solutions to Data Cleaning Introduction 165 A Quick Review of PROC SQL 166 Checking for Invalid Character Values 166 Checking for Outliers 168 7 8 6 5 vi Table of Contents Checking a Range Using an Algorithm Based on the Standard Deviation 169 Checking for Missing Values 170 Range Checking for Dates 172 Checking for Duplicates 173 Identifying Subjects with "n" Observations Each 174 Checking for an ID in Each of Two Files 174 More Complicated Multi-File Rules 176 Corr Correcting Errors Introduction 181 Hardcoding Corrections 181 Describing Named Input 182 Reviewing the UPDATE Statement 184 Corr Creating Integrity Constraints and Audit Trails Introducing SAS Integrity Constraints 187 Demonstrating General Integrity Constraints 188 Deleting an Integrity Constraint Using PROC DATASETS 193 Creating an Audit Trail Data Set 193 Demonstrating an Integrity Constraint Involving More than One Variable 200 Demonstrating a Referential Constraint 202 Attempting to Delete a Primary Key When a Foreign Key Still Exists 205 Attempting to Add a Name to the Child Data Set 207 Demonstrating the Cascade Feature of a Referential Constraint 208 Demonstrating the SET NULL Feature of a Referential Constraint 210 Demonstrating How to Delete a Referential Constraint 211 9 10 Table of Contents vii Corr DataFlux and dfPower Studio Introduction 213 Examples 215 Listing of Raw Data Files and SAS Programs Programs and Raw Data Files Used in This Book 217 Description of the Raw Data File PATIENTS.TXT 217 Layout for the Data File PATIENTS.TXT 218 Listing of Raw Data File PATIENTS.TXT 218 Program to Create the SAS Data Set PATIENTS 219 Listing of Raw Data File PATIENTS2.TXT 220 Program to Create the SAS Data Set PATIENTS2 221 Program to Create the SAS Data Set AE (Adverse Events) 221 Program to Create the SAS Data Set LAB_TEST 222 Listings of the Data Cleaning Macros Used in This Book 222 ...展开详情收缩
(系统自动生成,下载前可以参看下载内容)

下载文件列表

相关说明

  • 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
  • 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度
  • 本站已设置防盗链,请勿用迅雷、QQ旋风等多线程下载软件下载资源,下载后用WinRAR最新版进行解压.
  • 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
  • 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
  • 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.
 相关搜索: 数据清洗英文
 输入关键字,在本站1000多万海量源码库中尽情搜索: