>> From the Library of Congress in Washington DC. ^M00:00:06 [silence] ^M00:00:17 >> TOMOKO STEEN: Welcome, everyone. I am Tomoko Steen at Science, Technology, and Business Division here at the Library of Congress. Today's event is organized by our division, science, and technology, and business. Sometimes we co-sponsor these events with other part of the library. Today's speaker is Dr. Ilya Zaslavsky, is director of the Spatial Information Systems Lab at the San Diego Supercomputer Center, which is part of the University of California, San Diego. His research focus on very complex program. It's a distributed information management systems and spatial and temporal data integration. So Dr. Zaslavsky received his PhD in University of Washington in geology in 1995 and before he came to this country, he also received his PhD from Russian Academy of Science in 1990. He has led design and technical development for several large cyber infrastructure project supported by National Science Foundation and that's include Asque and the National-Scale Hydrologic Information System. He's also co-chair of the OGC, WMO, Hydrology Domain Working Group, which is a way to develop international standard for the water data. I hope I said the right. Today, Dr. Zaslavsky is going to talk about a very exciting topic. SDG, which is Sustainable Development Goals. SDG is United Nations initiative and its long name is "Transforming Our World, the 2030 Agenda for Sustainable Development." This is a set of 17 global goals with 169 targets. So the aims are: end poverty, to protect the planet, and ensure prosperity for all of us on earth. So the goal was put forward 2015 by UN and before I go over difficult, the words, I think it's best to explain by the speaker. So before further ado, please join us, welcoming Dr. Ilya Zaslavsky. [applause] ^M00:03:42 >> ILYA ZASLAVSKY: Thank you very much. Thank you very much, Dr. Stein. I am Ilya Zaslavsky. I am gonna talk about sustainable development goals but sustainable development goals is just one of the applications of the software that we developed and I will start with describing general trends in computing and in cyber-infrastructure and try to talk through transformations that are happening in the computing world and see how the tools that we have been developing recently help bring in more people, engage users, and make it easier to access data and analyze data. So one of the applications is sustainable development goal indicators, but there are several other applications and time permitting, I will show you if that would range from Picasso and Van Gogh paintings to somebody diversity image collections, to other library collections. This talk may be a little unusual because I don't have that many slides. I will do the introduction and then we'll try to switch to a live demo. I'll tell you where I come from. I work at San Diego Supercomputer Center which is a fairly large and one of the oldest super computing centers in the country. It was one of five original centers founded in 1985. We work on high-performance computing and advanced networking. But more recently, the focus has been cyber-infrastructure. I hope you've heard this word. So these are the key part here is infrastructure. We're trying to make it such that you don't need to understand the details of computing in order to be able to use it. Similar to electrical grid. Electrical grid is a very common metaphor used to explain infrastructure. You plug your appliance to an outlet and you should be able to use it without understanding the details of how electricity is generated. In computing, there of course are lots of databases and cloud services and web services and devices and many different components are involved in order to deliver the data and some services to users, and users are not supposed to understand the mechanics of it. They should be able to just plugin, to find the data or plug in their own data, and be able to use it. That's, currently, a dream and a very big challenge and research on this area is being funded by National Science Foundation and NASA and DOA and other agencies. Right. So, cyber-infrastructure. And this slide is from Jim Crosier who is the head of computing and engineering department division at NSF. Just to give you an outline of what cyber-infrastructure is and what is the vision of cyber-infrastructure at the NSF. You have a lot of different types of resources that include cloud resources and various compute systems, storage systems and so on, and some of these are supported by NSF and there are of course lots of resources in the commercial world, there are international resources, and so on. On top of it, there is middleware layer where that manages communications and exchange between different systems and helps to orchestrate the systems working together. There are some acronyms here which I am not gonna spell out, but if you have large data sets sitting in different storage systems, you may want to connect them. And so there is a brokering system. If you want to access these different systems, you may need to have a single sign-on system so that you don't log in to each individual system to work with it. And on top of it, there are science-focused applications and there are a lot of tools that have been developed in this domain including various APIs and portals and gateways that support different sciences. So, the dream is how to move it to become more plug and play so that we don't have to think about each individual population and the separation are hidden from us. Kind of towards the electric grid. This is, as I said, challenging because we see a lot of transformation going on in different areas of computing. So that's another slide from NSF which outlines so-called cyber-infrastructure ecosystem. And that includes organizations and computer resources and software and networking. Changes are happening in all of that. Software used to be developed by small groups, now this is large-scale collaborative activity that involves multiple groups working together so they have to communicate, they have to be able not to step on each other's toes, so to make this work smooth, organizations have to align what they do to support collaborative science. Scientific instruments have to plug in to the same infrastructure and be able to not only provide data but also provide information about how this data has been collected so that you can always go back and understand the origins and try to explain where it comes from. And of course, huge changes in the data. What we have been generating in a year or in two years exceeds what has been generated in all previous times. And so the questions are, do we want to store this data? That's not possible, because we generate more data than we can store. Do we want to somehow mine this data? How you decide what is important to keep, what can be discarded? And, most importantly, how to use it such that it brings along benefits for everybody. So this is a very complex system and transformations are going on in all parts of it and so, there's no surprise that people are talking about revolution and sometimes call it "big data revolution." And these are just some of the snapshots that I pulled off the web. We're talking about forty zettabytes of data in digital universe. And zettabyte is 10 to the power of 21, sextillion or something like that. We have a huge amount of quantum posted on YouTube and Facebook. 200 million emails sent each minute. So this is a huge amount of information to handle. And if you want to make use of this information, there should be tools that can quickly parse this data. And so, the entire ecosystem, software ecosystem, is changing to tune to this amount of data. How people have defined big data? In 2001 there was an original paper that started with 3 V's. Big data is data that has huge volume, comes at you with high velocity, and presents a large variety of data types. That idea of adding more V's became quite popular and that's a clear sign of a marketing hype, of course, so eventually you end up with vendors claiming that only our software can solve your verisimilitude problems, and other types of V's that have been added. But volume, velocity, and variety are perhaps the most important characteristics that everybody who work with data have been trying to tune to. ^M00:13:02 >> So this is just to emphasize the point that you really want to have a grid-like system where you don't want to understand the crazy variety of tools and systems that exist. They have overlapping boundaries, they don't necessarily talk the same language, they don't necessarily connect with each other. And it also highlights the importance of nurturing new types of scientists who can actually navigate in this huge ecosystem and figure out what is usable and what is connectable. That's the big data landscape that is compiled from different versions that happens every year and I'm sure there are lots of systems that aren't mentioned on this chart. What we have been doing in this space. And so this is my lab. We have been working, mostly, on spatial data integration and large information systems. And I'll just mention a couple of them. One is Hydrologic Information System. I note here that this is the largest in the world, probably the largest in the world, it's hard to make claims like that, because it's not a single database. It's a system where you can connect to multiple databases that follow the same language and the same protocols. So the first thing we did when we tried to organize such a system is to develop this common language. It was called Water Markup language. If you can expose your data in this language, then it will be unambiguously interpreted by any client. And that is an important component of different systems talking to each other. So this language has been adopted by multiple groups, universities, and federal agencies, and as a result, we have a system where you can, from a single client, you can query all of them. Another interesting example is with integration of brain data. When you work with neuroscience atlases, in this case we have worked with atlases of rodent brain, people develop photographic slices and organize them as stacks. And it is important to understand distribution of the signal that you see in one collection of slices. How it relates to something which has been observed by other people in other country and in a different research group. The key question here is how you can relate information about location in the brain. It's not earth, so there's no latitude, longitude here, not to mention that all brains are different. So, you have to come up with some kind of probabilistic specification of location and then develop services that will exchange information about locations so that once you register your slices to the system, you would be able to get a gene expression, segmentations, and other data that somebody have observed in that part of the brain. So, you see from these two examples that we've been working mostly in the middleware realm, but I was actually planning on talking about a different project where I think we are trying to tap into totally different part of big data challenges, and that is making data accessible and easy to use by large groups of people. So I will be talking about the system called SuAVE. SuAVE, no references to shampoo, it stands for Survey Analysis via Visual Exploration. And surveys are treated somewhat generically here. So it's not just questionnaire surveys. It could be biodiversity surveys or vegetation surveys, soil surveys, and other types of collections where you have information that may come from observations from some annotations from notes, from devices, from anything. And so you have various heterogeneous of information that you need to explore and figure out where you start subsequent analysis. So, you see that there are different types of applications, even on this screen, I have public opinion survey and Van Gogh paintings, which each of these paintings has some metadata that you can use to slice and dice your collection. You can have different views of the data, map views, bar charts, cross tabs, and so on. Before, I was hoping to show you the demo, I will talk about another type of data revolution. So, data revolution is a term that is used by United Nations to highlight the challenges that people now have with managing development worldwide. It's very important to have information about what's going on in different countries and how countries perform on various matters, so that to be able to make decisions, and to be able to assess where each country is with respect to sustainable development goals. So in 2014, there was a UN report that proposed global partnership for ... ^M00:19:14 >> Yes, thank you. Partnership for sustainable development data and the key component of it was agreeing on sustainable development goals and developing indicators to monitor progress of every country that subscribed to this document and there're 194 countries that agreed to follow that system. So in September of 2015, this program, "Transforming Our World," the 2030 Agenda for Sustainable Development was adopted that specified 17 sustainable development goals. This is the program for the United Nations for the next 15 years. In the previous 15 years, the program was called "Millennium Development Goals" and that was adopted after millennium summit in 2010, also focused on 15-year period. There were eight goals. The progress has actually been quite remarkable on these eight goals, but very uneven across countries. And these 17 goals have just started so we have baseline data for each country on these goals. Goals include eradication of poverty and hunger. Establishing conditions for good health and well-being. Education and a bunch of other matters, targets that deal with development, especially sustainable development, ecological factors, and social inclusion. Let's try. Right. So this is the website, SuAVE.sdc.edu, and you can see that there's a gallery, there are news, there are blogs and a few other common elements of a website. If you go to news and search for SDG's, there will be an application that deals with SDG indicators. You see that what I have here is a collection of flags for each country. You can look at different types of representations. Maps and so on. I can map by, let's say, Sustainable Development Goal rank. So all countries are ranked in terms of how close they are to fulfilling the goals. If I look at, let's say, Sustainable Development Goal rank and look at, let's say, the first 40, you would see that the best countries, in terms of closeness to SDG indicators are Scandinavian countries. ^M00:22:30 >> AUDIENCE: What is the source of the data that each country ... Do they create and generate their own data, therefore it would seem to be, probably, uneven and inconsistent across 200 countries? Where does this data come from? >> ILYA ZASLAVSKY: Very good question. So it is a fairly complicated process. There is UN statistical commission, that compiles data from multiple sources, and of course it has to rely on local sources, but there are multiple verifications that happen. The most difficult thing is to align different measurement protocols. But, one of the key goals here was to define such indicators that will be measured more or less unambiguously among countries. SDGI rank, something that I showed, is just an average of ranks on different goals. Does that make sense? >> AUDIENCE: Does SuAVE normalize the data? You said that the biggest challenge is... >> ILYA ZASLAVSKY: No. SuAVE does not do it. People who prepare the data do that. We just take the data and make it accessible and easy to use. Okay. So, here you can... So, these are 40 countries that happen to be in good shape. This is where they are on the map, if you are interested in countries that are at the end of the spectrum. That's their locations. And this is how they are distributed by rank. But in the same fashion you can look at, you see that there are about 90 indicators here, not the 469 or so indicators that have been proposed, many of these indicators are very difficult, actually, to compile information for. So even those indicators that we have here do not have complete coverage sometimes. Okay? You can look at, let's say, poverty below ... percent of people who live on $1.90 per day - percent of those people. So this has a fairly complete coverage, but poverty rate, these are only countries for which poverty rate has been estimated. So, it's not even. It's, I should say, a work in progress. What I'm trying to demonstrate is not where the data come from, also, you can click on "About Survey" and get the information about the source and follow that source. But to show what the software can do. So, for example, sorry about this, and... You are looking at ... One interesting characteristics is subjective well-being. Let's look at it. That was estimated based on surveys and it runs on a scale from 0 to 10, so you can see how countries are distributed. It would be interesting to see if subjective well-being is actually how it relates to objectively-measured variables, right? You can look at, let's say, subjective well-being versus, for example, poverty. And you would see that it's not, of course, a diagonal. There are lots of outliers on both sides. There are countries where populations subjectively think that they're in good shape, but poverty level may be high and the other way around. So if you look at these countries, for example, this is where poverty level is fairly low but subjective well-being is also quite low. These are Cambodia, Gabon, Bulgaria, Sri Lanka, so some of these countries experienced unrest in recent years or have been in the shadow of some larger metropolis and that may contribute to lower values of subjective well-being. So there may be multiple explanations that you can develop here. I would probably just invite you to play with the data, because I will try to show you that it's very easy to play with the data and see. Yeah? ^M00:27:32 >> AUDIENCE: I have played around a little bit ... I didn't find this one, but I saw some of a similar one and played around with it. Is there a way in here to look at the raw data? >> AUDIENCE: Can you repeat the question? >> ILYA ZASLAVSKY: Yes. So the question is, is it possible to look at the raw data? So, again, if I go to about data and click on this link, it will open another webpage where I got the data from. And there you can get to the actual spreadsheet and look at the methodological description and see, in the metadata statements, where this data came from and how they were computed. >> AUDIENCE: And I was looking at it, thinking about how to play around with it with my own data, so is that something you'll talk about in your [crosstalk] >> ILYA ZASLAVSKY: Right now. >> AUDIENCE: Yay! >> ILYA ZASLAVSKY: Okay. And I also have my slides here. Yeah, so, I'll do this in just ... It will be the demo after the next one. So the idea is, as I said, totally different from what I've been doing in my regular research is how to make your data accessible and how to make it easy and fun and engaging to work with. How many of you have answered surveys? Any types of surveys. At work, at ... How many of you have ever tried to access the results of these surveys or have seen the results of these surveys or tried to not just ... Sometimes results come back as documents, as pie charts, and bar charts, but if you would want to combine and explore the variables, you are out of luck. And that is a very common problem. So even if you go to a government website, such as National Cancer Institute, or General Social Survey source or another one, all you see here is ability to load data in SPSS, SAS, STATA, if you are a survey analyst, you are in luck, if you are somebody who is just interested in exploring the data yourself, it may be problematic. So, this is an example of how you can do surveys using the same data that you can download from the site. So, this is General Social Survey. Probably some of you have heard of that. That is a survey that is very important in the US. It's been run by the National Opinion Research Center since 1972. Every 2 years on average, there's an additional survey, and people answer all sorts of different questions about different types of ... I think I should actually have it open somewhere. No? No. Okay. It opened. Typically when you analyze surveys, you work with tables, you work with charts, right? Here, the visual paradigm is totally different. Each respondent is an icon and you just observe how these people, these icons, form groups. You can zoom in to any person and see how that person responded to questions. You can have a few additional visual dimensions. So here, we know that it's a white woman. So you can specify your icons to be representative of certain variables you select such as race, and gender, or it could be other variables, it's your choice. Let's try to do some analysis here. So one of the questions was about happiness, general happiness. Let's look at how people response to about your level of general happiness. We see that 26% of people say that they are very happy. Let's try to explain that based on some factors and we can try income, we can try other things that have been asked here, or we can try a level of education. Let's see, highest tier of school completed. Let's say 15 years and more. That increases your level of education somewhat from 26 ... Your level of happiness, sorry, from 26% to 31%. You can compute the contribution of this factor, so that's 5%. In fact, this is somewhat similar to a regression model but based on mixed data where you just compute contributions of different factors in the explanation. In this case, explanation of behavior saying very happy in response to this question. Interestingly, you can ... There are several questions about education. If you look at highest tier of education completed by spouse, actually, your level of happiness will be significantly higher. So now we have 44% saying they're very happy as compared to previous 31, so this contribution here is 18%. And we can try to analyze it further and see how it is different by gender, or how ... Actually, the first question probably should be qualified by age and marital status, because then we can really compare these things. But you can see that analysis becomes quite simple here, you just add additional factors. So let's just say, that would be age and gender, and there are all sorts of factors here, so for one group it will go in one direction, for females it will go in different direction and I will let you interpret the results yourself. So, now, how you get your own data into the system? Let's say, I have a data set that sits on my desktop and it's just a CSV file that is a survey of geoscientists who were asked about 300 or so questions about how they use different types of data, how they use data developed by others, what kind of challenges they have when they need to use data and so on. We can open this data set in Excel and you will see that this is a very standard Excel file that just has questions, has columns and these are respondents. There are 1500 or so people. So it's fairly easily formatted. Now, when you create an account in SuAVE, you get ... yeah, this is my account and I have about 88 different surveys in this account. Once you create an account, you will have a gallery like that, but it will be empty initially. So if you want to add a survey, you just say, new survey, point to this file which is on the desktop that's here, and give it a name. Whatever you see survey. That's it. ^M00:36:05 >> AUDIENCE: Who owns the data once you've uploaded onto SuAVE? >> ILYA ZASLAVSKY: Actually, you own the data, and I'll show you in a second. So this is your personal gallery. For most people, I don't know what they put there, I just check the storage, because you can specify this data to be private and then it will not show up in your gallery for others. It will still show up for you to manage this, but when others come to your gallery, they will not see it. Right. So we created this survey. You can already analyze it now. But it's not really that much fun, because remember we can also associate a shape and color of icons with some variables and that will make it somewhat more interesting. So initially it's just the default, but everything that you wanted to do is already ... Population is already here. So let's close that and make it look more interesting, so I will have a gender, I will have male and female silhouettes. So gender, it will parse the file, and suggest icons for me. Okay. I can also say that colors will reflect primary discipline. Let's say we have geoscientists and computer scientists in that survey, and those who said that they're both geo- and computer scientists. You can change colors, and I'm gonna do that now. And then you can also specify what will show up as a dynamic text over each person. So when you zoom in, you will actually get more information about that. So let's do my primary role here. When you submit it, it should update the survey and now you will have a somewhat better way of looking at it and also sharing it with others. So it's much easier than trying to figure out how you can find a tool that will read SPSS file or STATA file. So here you can look at distributions. Let's say how people answered questions about the importance of different types of data. And so, how important atmospheric data for you. So that's a distribution. So you may hypothesize that if you are atmospheric science person, then atmospheric science data will be important for you, which will be generally correct. Right. So I'm just subsetting this for people who declare themselves atmospheric scientists, except for one person, who says that the data in his domain are of low importance. I think we can actually click on this guy and see who that is. I mean, not who that is, but how that person answered questions and tried to explain why it happened. So this is a way to zoom in from overall picture to deviant cases and kind of make this observation of, transition from your general view to view of a single, small group, very smooth and easy. And then if you are interested in this pattern, let's say you want to share it with your coworkers, you have a distributed group, then you would say, "comment on this, look at this interesting behavior on the left." Right? So you can save this annotation and then you can share this annotation with others. So you just click share and that will generate a link that you can put on your Facebook or send to your coworkers or put in an email and then when users click on that link, they will open this same view and can take exploration in other directions. So, that answers your question somewhat. ^M00:41:04 >> AUDIENCE: Yes. Thank you very much. Would you mind also [inaudible]. >> ILYA ZASLAVSKY: Yes. So. Right. In this survey, I don't have location information. If I would have latitude and longitude as two columns, then I would just click the map here and it will show me a map. If I don't have latitude and longitude but I have address, I will specify that this is address information and I do believe it will geocode on the fly and create a map, so there are two ways to do that. ^M00:41:44 >> AUDIENCE: To me, It looks like it's a very powerful system for describing data. Is there also a component that would account for some of the statistical power, the statistical properties of surveys or different types of survey designs? >> ILYA ZASLAVSKY: Right. So the question is can this system be made more traditionally statistically powerful? And the answer is, somewhat ... Well, of course, it's a yes, but it's actually a little bit more complex. We are trying to develop, to make it such that visualizations and statistical measures go hand in hand and reinforce each other. When we compute traditional statistics, quite often, they would not have that clear visual expressions compared to increments or decrements of conditional frequencies. So those changes in conditional frequencies, which generally is on the fly on the client side, as I showed in the table of rules before, where you can compute contributions of selected factors to explanation. We also have a connection with "r". So. I'll show you in a second ... If I would ... Oh, yes. So, here, I can also say that I want to include "r" and it will include an icon on the right that will allow me to call "r" via our open CPU interface and pass variables that I defined as independent variables and dependent variables to "r" to compute. In this case, we have enabled logic, pro fit and log-linear models but can be done. Yeah. The same bridge can be used for other types of computations. Does that answer? ^M00:43:48 >> AUDIENCE: Do you think you can verify a little after, again? >> ILYA ZASLAVSKY: Right. So we have applications in different areas and since I promised you at least Picasso, I'll have time for Picasso and then we might have to wrap-up. You may go to the web page and you would find, actually, a bunch of galleries of artists. So I have Picasso, Van Gogh, Dali, Balthus, Bosch, Rivera, and a few others. So, let's look at Picasso. And this really demonstrates the power of doing exploration with visual images. So these are paintings. And as you see, as you zoom in, you get higher resolution versions. You have heard, I'm sure, of different periods in his life, like blue period, pink period, African period, and others. Let's try to detect these periods. So, I would turn on paintings that have been done at the turn of the century. And sort this by year. And here you go. Right? Indeed, towards the end of 1901, he fell into a deep depression that lasted three years, or almost four years, and typically it's attributed to his friend and fellow Spanish artist who lived in Paris and was a good friend, Carlos Casagemas, he committed suicide, and so, Picasso went through some very dark times. And most of the paintings are also ... The subjects are beggars and all sorts of ... ^M00:46:17 >> AUDIENCE: How did you get the data to describe the paintings? Is it input metadata, or is it a visual software that visually analyzes the colors? >> ILYA ZASLAVSKY: Very good question. Both ways. I work with several art historians. They sent me data. So, for example, the Van Gogh data set, it came from Lev Manovich who works in New York, and he's director of Software Studies Initiative and also practicing artist. So he studies so-called style-spaces. You can place photograph of a painting in multidimensional space, look at clusters, and see how as artists moved from. Let's say one place to another, whether it corresponded to change in clusters. >> AUDIENCE: Based on what? >> ILYA ZASLAVSKY: Yes, based on brightness, hue, saturation, plus other things that you get from the picture. Other things become more difficult. Brightness, hue, saturation is fairly easy to get and this is what we have in Van Gogh. So maybe I can show you. So, there are two ways. There are actually more than two ways. One is to pre-compute some of these measures based on image characteristics. Another is let people come to this application and annotate it in the way I showed you and they would be able to capture some more information. And then, we also have a system where people have been annotated based on forms, so that the information goes directly, goes back into SuAVE and people can analyze it immediately. Does that answer your question? Right. So, in the future, we are planning to have a somewhat more advanced system where people will be able to define, themselves, what kind of additional attributes of pictures they would want to extract and bring into an analysis. For now, it's done outside the visual interface. ^M00:48:33 >> AUDIENCE: Did you have some animal locomotion there? When I first came into the room, that was on the screen. There were horses running and I wondered if you had worms crawling and turtles going and what all. And I thought I saw our animals there. But you don't. >> ILYA ZASLAVSKY: I don't have horses. I have tigers. >> AUDIENCE: Oh. >> ILYA ZASLAVSKY: So, it's been used for camera trap images. And you can look at a collection of images and, let's say, tigers or onkas or ... And see at what ... It all depends on what metadata you have. You can see at what times of day they are active or at what temperatures they tend to show up and it's actually interesting to compare it across different species. I also have examples from a system called iNaturalist where people contribute their photographs of what they observe in their yards and that goes into a large image archive with metadata. So I have some examples with that. Not with horses, no. Yes, and I have one last slide, I think that will... Yeah, so the types of applications I more or less described, I will skip that and the last slide is, this is really an invitation to explore what you can do with the data yourself. It's a free system, it's open-source system, it's supported by National Science Foundation, and you can create a login on the system, you will get your personal gallery, upload your data, and play with it. We opened it up about 5 months ago, we have 90 users who load their data, and you're very welcome to do this. I really think that making data easily accessible will make a dramatic impact on the this-called data revolution, or make a real difference there because it will increase the level of trust that people have in any statements based on data and it will let people play with the data themselves, which is interesting. Thank you very much and sorry about the difficulties. [applause] ^M00:51:12 >> ILYA ZASLAVSKY: Questions? >> AUDIENCE: I have another questions. Could this handle a genuine big data set, like computational fluid dynamics data set? >> ILYA ZASLAVSKY: We'll need to talk. So, the question is, "can this handle very large data sets?" We have interfaced it with a system that has about 400,000 records. So, of course, you cannot put 400,000 dots on a single screen. The maximum we had so far was 8300. And these were images of macrofossil samples from British Geological Survey. It's a fairly large application about 30 GB worth of data, well I should ... I can say in one screen. But really, at some point these are dots and when you zoom in, you actually get more information. So, with large collections, you have to first search and come up with some subset that will be meaningful to display in SuAVE and we've done that. So, if you have some specific application in mind, we can talk. >> AUDIENCE: [inaudible] >> ILYA ZASLAVSKY: Thank you. [applause] ^M00:52:32 >> This has been a presentation of the Library of Congress. Visit us at loc dot gov.