2021 交通数据挖掘技术（Data Mining for Transportation）(东南大学) 最新满分章节测试答案

文章目录[隐藏]

Week 1. Introduction to data mining Test 1
Week 2. Data pre-processing Test 2
Week 3. Instance based learning Test 3
Week 4. Decision Trees Test 4
Week 5. Support Vector Machine Test 5
Week 6. Outlier Mining Test 6
Week 7. Ensemble Leaning Test 7
Week 8 Clustering Test 8

本答案对应课程为:点我自动跳转查看
本课程起止时间为:2021-03-01到2021-06-30
本篇答案更新状态:已完结

Week 1. Introduction to data mining Test 1

1、问题:Which one is not the description of Data mining?
选项：
A:Extraction of interesting patterns or knowledge
B:Explorations and analysis by automatic or semi-automatic means
C:Discover meaningful patterns from large quantities of data
D:Appropriate statistical analysis methods to analyze the data collected
答案: 【Appropriate statistical analysis methods to analyze the data collected】

2、问题:Which one describes the right process of knowledge discovery?
选项：
A:Selection-Preprocessing-Transformation-Data mining-Interpretation/Evaluation
B:Preprocessing-Transformation-Data mining- Selection- Interpretation/Evaluation
C:Data mining- Selection- Interpretation/Evaluation- Preprocessing-Transformation
D:Transformation-Data mining- election-Preprocessing- Interpretation/Evaluation
答案: 【Selection-Preprocessing-Transformation-Data mining-Interpretation/Evaluation】

3、问题:Which one is not belong to the process of KDD?
选项：
A:Data mining
B:Data description
C:Data cleaning
D:Data selection
答案: 【Data description】

4、问题:Which one is not the right alternative name of data mining?
选项：
A:Knowledge extraction
B:Data archeology
C:Data dredging
D:Data harvesting
答案: 【Data harvesting】

5、问题:Which one is not the nominal variables?
选项：
A:Occupation
B:Education
C:Age
D:Color
答案: 【Age】

6、问题:Which one is wrong about classification and regression?
选项：
A:Regression analysis is a statistical methodology that is most often used for numeric prediction.
B:We can construct classification models (functions) without some training examples.
C:Classification predicts categorical (discrete, unordered) labels.
D:Regression models predict continuous-valued functions.
答案: 【We can construct classification models (functions) without some training examples.】

7、问题:Which one is wrong about clustering and outliers?
选项：
A:Clustering belongs to supervised learning.
B:Principles of clustering include maximizing intra-class similarity and minimizing interclass similarity.
C:Outlier analysis can be useful in fraud detection and rare events analysis.
D:Outlier means a data object that does not comply with the general behavior of the data.
答案: 【Clustering belongs to supervised learning.】

8、问题:About data process, which one is wrong?
选项：
A:When making data discrimination, we compare the target class with one or a set of comparative classes (the contrasting classes).
B:When making data classification, we predict categorical labels excluding unordered one.
C:When making data characterization, we summarize the data of the class under study (the target class) in general terms.
D:When making data clustering, we would group data to form new categories.
答案: 【When making data classification, we predict categorical labels excluding unordered one.】

9、问题:Outlier mining such as density based method belongs to supervised learning.
选项：
A:正确
B:错误
答案: 【错误】

10、问题:Support vector machines can be used for classification and regression.
选项：
A:正确
B:错误
答案: 【正确】

Week 2. Data pre-processing Test 2

1、问题:Which is not the reason we need to preprocess the data?
选项：
A:to save time
B:to make result meet our hypothesis
C:to avoid unreliable output
D:to eliminate noise
答案: 【to make result meet our hypothesis】

2、问题:Which is not the major tasks in data preprocessing?
选项：
A:Clean
B:Integration
C:Transition
D:Reduction
答案: 【Transition】

3、问题:How to construct new feature space by PCA?
选项：
A:New feature space by PCA is constructed by choosing the most important features you think.
B:New feature space by PCA is constructed by normalizing input data.
C:New feature space by PCA is constructed by selecting features randomly.
D:New feature space by PCA is constructed by eliminating the weak components to reduce the size of the data.
答案: 【New feature space by PCA is constructed by eliminating the weak components to reduce the size of the data.】

4、问题:Which one is wrong about methods for discretization?
选项：
A:Histogram analysis and Binging are both unsupervised methods.
B:Clustering analysis only belongs to top-down split.
C:Interval merging by c2 Analysis can be applied recursively.
D:Decision-tree analysis is Entropy-based discretization.
答案: 【Clustering analysis only belongs to top-down split.】

5、问题:Which one is wrong about Equal-width (distance) partitioning and Equal-depth (frequency) partitioning?
选项：
A:Equal-width partitioning is the most straightforward, but outliers may dominate presentation.
B:Equal-depth partitioning divides the range into N intervals, each containing approximately same number of samples.
C:The interval of the former one is not equal.
D:The number of tuples is the same when using the latter one.
答案: 【The interval of the former one is not equal.】

6、问题:Which one is wrong way to normalize data?
选项：
A:Min-max normalization
B:Simple scaling
C:Z-score normalization
D:Normalization by decimal scaling
答案: 【Simple scaling】

7、问题:Which are the right way to fill in missing values?
选项：
A:Smart mean
B:Probable value
C:Ignore
D:Falsify
答案: 【Smart mean;
Probable value;
Ignore】

8、问题:Which are the right way to handle noise data?
选项：
A:Regression
B:Cluster
C:WT
D:Manual
答案: 【Regression;
Cluster;
WT;
Manual】

9、问题:Which one is right about wavelet transforms?
选项：
A:Wavelet transforms store large fractions of the strongest of the wavelet coefficients.
B:The DWT decomposes each segment of time series via the successive use of low-pass and high-pass filtering at appropriate levels.
C:Wavelet transforms can be used for reducing data and smoothing data.
D:Wavelet transforms means applying to pairs of data, resulting in two set of data of the same length.
答案: 【The DWT decomposes each segment of time series via the successive use of low-pass and high-pass filtering at appropriate levels.;
Wavelet transforms can be used for reducing data and smoothing data.】

10、问题:Which are the common used ways to sampling?
选项：
A:Simple random sample without replacement
B:Simple random sample with replacement
C:Stratified sample
D:Cluster sample
答案: 【Simple random sample without replacement;
Simple random sample with replacement;
Stratified sample;
Cluster sample】

11、问题:Discretization means dividing the range of a continuous attribute into intervals.
选项：
A:正确
B:错误
答案: 【正确】

Week 3. Instance based learning Test 3

1、问题:What’s the difference between eager learner and lazy learner?
选项：
A:Eager learners would generate a model for classification while lazy learner would not.
B:Eager learners classify the turple based on its similarity to the stored training turple while lazy learner not.
C:Eager learners simply store data (or does only a little minor processing) while lazy learner not.
D:Lazy learner would generate a model for classification while eager learner would not.
答案: 【Eager learners would generate a model for classification while lazy learner would not.】