CHI 2014: Decisions, Recommendations, and Machine Learning

Customization Bias in Decsion Support Systems by Jacob Solomon

user satisfaction improves with customizability; is it a good design choice for decision support systems?
data -> system -> recommendation -> decision maker -> decision
some systems support customization; customization -> recommendation quality -> decision quality; is this always true?
customization bias: bias because decision maker has a part in driving the recommendation; reduce ability to evaluate quality of recommendation; supports confirmation bias
experiment: fantasy baseball; predict scores assisted by DSS; one group could adjust statistical categories used, other couldn't; recommendations were predetermined, no algorithm, and both got same recommendations; subjects received 8 good recommendations and 4 poor recommendations; 99 MTurk participants with fair baseball knowledge
findings: customizers had slightly better recommendations, but not the point of the study; customizers were more likely to agree with system; more likely to agree if recommendation was consistent with customization (confirmation bias); customization can enhance trust in system but trust is sometimes misplaced; ties decision making more to quality of recommendation (whether it gives good or poor ones)

Structured Labeling for Facilitating Concept Evolution in Machine Learning by Todd Kulesza

data needs to be labeled for machine to distinguish; people don't always label consistently; concept evolution – mentally define and refine concept
study: can we detect concept evolution; 9 experts, 200 pages, twice with 2 weeks in between; experts were only 81% consistent with prior labeling
can we help people define and refine concept while labeling? added 'could be' choice to yes and no to allow additional refinement later after concept refined; often didn't name the groups, so then provided automated summaries; forgot what they did with a similar page, so automated recommending a group; not sure some pages were worth structuring, so show similar future pages
study: 15 participants, 200 pages, 20 minutes, 3 simple categories; conditions of no structure, manual structure, and assisted structure
findings: manual structuring created many more groups than automated; also mad many more adjustments in first half of experiment, less later; manual structuring more than tripled consistency and assisted almost tripled; took longer than baseline to label early items, but not longer for later items; preferred structured and assisted over baseline; easier to verify recommendation than to come up with their own

Choice-Based Preference Elicitation for Collaborative Filtering Recommender Systems by Benedikt Loepp

recommendation system: select items from large set that match interests; collaborative filtering is most popular and is effective; criticized because focus is on only improving algorithms rather than improving user's role and satisfaction in use; also at beginning have no data to work from; ratings are inaccurate, comparisons are effective, but choosing comparisons depends on preexisting data
goal: improve user effectiveness and control; generate a series of choices based on most important factors in a matrix factorization; items must be frequently rated, highly diverse choices, similar in non-choice factors
evaluation: balance automatic recommendation and manual exploration; test 4 different user interfaces – popular, manual exploration, automatic recommendation, choice based model; 35 participants using each method to choose six movies + survey
results: choice based significantly better than other models in all dimensions but required more effort than popular; good cost-benefit ratio; users felt in control; no profile or additional data required; works well for experience-based products

ARchitect: Finding Dependencies Between Actions Using the Crowd by Walter Lasecki

activity recognition: system recognizing what you are doing; eg help people who may need assistance in living; automated systems need a lot of training data, where people can recognize very easily; crowd source from Legion:AR; still many permutations in behavior that must be recorded and labeled
approach: define dependency structure to constrain meaningful variations
ARchitect: ask.yes/no questions about different permutations of action steps to build valid models; eg 3 videos led to 22 valid models

Scalable Multi-label Annotation by Alex Berg

multi-label annotation: identify aspects/objects that are or are not in an image; big in machine vision
detect 200 categories in 100,000 images; large set is useful to many areas of research; expensive to scale, so exploit the hierarchical structure of concepts; correlation and sparsity; kind of like 20 questions for MTurk participants
how to select the right questions: utility, cost, accuracy
results: 20,000 images from set, 200 category labels; accuracy 99.5%+, 4-6x as fast