<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>iSchools and the Digital Humanities</title>
	<atom:link href="http://www.ischooldh.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.ischooldh.org</link>
	<description></description>
	<lastBuildDate>Mon, 29 Aug 2011 22:09:22 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Reflections about scale and topic modeling</title>
		<link>http://www.ischooldh.org/2011/08/reflections-about-scale-and-topic-modeling/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=reflections-about-scale-and-topic-modeling</link>
		<comments>http://www.ischooldh.org/2011/08/reflections-about-scale-and-topic-modeling/#comments</comments>
		<pubDate>Fri, 05 Aug 2011 12:36:10 +0000</pubDate>
		<dc:creator>Sayan Bhattacharyya</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.ischooldh.org/?p=407</guid>
		<description><![CDATA[In the last few weeks (as you have seen from previous blog posts), I have been working on the topic modeling project utilizing ongoing, cutting-edge work that is being done here at the University of Maryland in its Computer Science &#8230; <a href="http://www.ischooldh.org/2011/08/reflections-about-scale-and-topic-modeling/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>In the last few weeks (as you have seen from previous blog posts), I have been working on the topic modeling project utilizing ongoing, cutting-edge work that is being done here at the University of Maryland in its Computer Science department. (In the early part of the internship, as readers would recall, we were working on interface design and Scala programming concerning topic modeling from the Mallet toolkit, which has a slightly different approach and was developed at the University of Massachussetts).</p>
<p>The question of &#8220;scale&#8221; has been on my mind over the past couple of weeks. We are processing really vast amounts of text data &#8212; topic modeling for text data is the kind of approach whose power of discovery is predicated on the assumption that vast amounts of data will be available for it to run on. It makes me pause and reflect that the assumption that these approaches would keep becoming more prominent and visible in the coming years rests on some other assumptions, which are both technological and social. For one thing, increased success for these approaches will depend on Moore&#8217;s Law continuing to hold (i.e. more and more processing power being available more and more cheaply), and also on the willingness (and legal feasibility) of those libraries and institutions that own such vast repositories of texts, to make them available in computer-readable formats. I realize that it is studying information science at an info-school (I am an SI student at Michigan) which makes me think about these additional dimensions. If I had remained just a computer-science person, I probably wouldn&#8217;t have thought about simply how much of a socio-technical  infrastructure is needed to put so much text online, and if I had remained a humanities person (which I also have been in the past), then it might not have occurred to me to think about the underlying technological breakthroughs in electronics that is making such continued scaling-up possible (and will hopefully continue to do so in the future) for such approaches as topic modeling.  I appreciate how being a student of Information Science attunes me to think about the entire ecology within which a particular approach is being developed.</p>
<p>While the availability of vast and increasing volumes of data makes one think of issues of <em>quantitative</em> scale, I also had an appreciation, over the last couple of weeks, of what one might call the <em>qualitative</em> scale of the challenge posed by taking this approach, especially when one tries to improve on the sophistication of the underlying algorithm by bringing, for example, domain knowledge to bear on the problem. An example from what we have been doing: earlier, we were working with the &#8220;unsupervised&#8221; topic modeling approach, in which no knowledge of the content of the text is really needed &#8212; the algorithm simply cranks away at whatever text corpus it is working on, and discovers topics from it. For the last week or so, though, we have focused on the brand-new and cutting-edge &#8220;supervised&#8221; topic modeling approach that is being developed by the computer science folks here at the University of Maryland. The idea in &#8220;supervised&#8221; topic modeling is to &#8220;train&#8221; the algorithm by making use of domain knowledge. For example, for the Civil War era newspaper articles archive that we are working with, we are making use of such related pieces of knowledge coming from sources outside of the corpus, as the casualty rate for each week, and the Consumer Price Index for each month, during the time period that these newspaper articles were being published. The idea behind this approach is that the algorithm will discover more &#8220;meaningful&#8221; topics if it has a way to make use of feedback on how well the topics discovered by it are associated with a parameter of interest. Thus, if we are trying to bias the algorithm into discovering topics that more directly pertain to the Civil War and its effects, then it will make sense to  align the aforementioned &#8220;other kinds of data&#8221; such as &#8212; in our case, casualty figures and economic figures &#8212; which have a provenance outside the text corpus. This is where the &#8220;qualitative&#8221; scale becomes important, I think. The person who will use this kind of approach successfully, in other words, will have to have some grasp, at least, of a wide variety of other fields, and know which information sources to go to to look up additional kinds of data and bring them to bear fruitfully on the problem. The sheer number of areas with  which the successful practitioner of this kind of work will, therefore, have to have at least a passing acquaintance, will &#8220;scale&#8221; up, the more intelligently we try to leverage these approaches&#8217; power. It also made me realize that, once again, it is people trained in information science &#8212; which is a truly interdisciplinary field &#8212; who are well positioned to do this. Over the last week, for example, I read several papers on the economic history of the Civil War (which we were pointed to by Robert K. Nelson, a historian at the University of Richmond who has worked on topic modeling and history) &#8212; who would have thought that one would have to read something  that in the course of a summer internship in Information Science?  I aligned the economic data with the text corpus, and based on what the data seemed to be telling us, I came up with <a href="http://www-personal.umich.edu/~bhattach/econhyp.pdf">a design for some experiments to test out some hypotheses</a>, which we will proceed to carry out over the next few days.</p>
<p>Also, in a piece of exciting news, the <a href="http://www-personal.umich.edu/~bhattach/RhetoricConferenceAbstractFinal.pdf">paper proposal</a> that we (Travis, Clay and I) submitted to the &#8220;Making Meaning&#8221; conference for graduate students, organized by the  Program in Rhetoric at the English Department of the University of Michigan, has been accepted. In preparing this presentation, too &#8212; which is going to be a reflection on how one might situate approaches like topic modeling  in the context of literary theory and philosophy &#8212; I think we will find that our interdisciplinary training as &#8220;information-science&#8221; people really helps us to see see, and think, in terms of the &#8220;big picture&#8221; &#8212; to <em>scale up</em> to the big picture, as it were.</p>
<p>P.S. Now that this post was a reflection on the question of <em>scale</em>, it just occurred to me that it is also appropriate that the programming language I learned during the earlier part of the internship was &#8212; <a href="http://www.artima.com/scalazine/articles/scalable-language.html">Scala</a>!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ischooldh.org/2011/08/reflections-about-scale-and-topic-modeling/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Week 8</title>
		<link>http://www.ischooldh.org/2011/07/week-8/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=week-8</link>
		<comments>http://www.ischooldh.org/2011/07/week-8/#comments</comments>
		<pubDate>Mon, 25 Jul 2011 01:32:19 +0000</pubDate>
		<dc:creator>j.meyerson</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.ischooldh.org/?p=393</guid>
		<description><![CDATA[My time at CDRH is coming to a close &#8211; Monday is the start of my last week here. I&#8217;ve learned so much! Where do I even begin? As far as how I&#8217;ll be spending my last week here &#8211; &#8230; <a href="http://www.ischooldh.org/2011/07/week-8/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>My time at CDRH is coming to a close &#8211; Monday is the start of my last week here. I&#8217;ve learned so much! Where do I even begin?</p>
<p>As far as how I&#8217;ll be spending my last week here &#8211; formatting all my documentation! I have some plans for what I think would be the most helpful for the next student who comes along to work on this project including: a diagram of how the javascript and php files talk to each other; an annotated list of web resources that were invaluable to me; Future Recommendations; a database design document that lists and describes the purpose of each table, the fields therein, and the relationship between tables; and a review of software tools that I used while working on my project. Documentation always made sense but during the course of spring semester&#8217;s Projects in Permanent Retention of Electronic Records, I learned the true importance of documentation. And not just documentation, but documentation <strong>as you go</strong>.  When my group was first assigned our project and told where to look for existing documentation we were both excited and a little scared. The archival imaging machine we were tasked with getting up to spec was a little too decontextualized for our taste. We knew what it was for, sort of. We knew some of the individuals that had worked with it. But we figured that if we could talk to everyone that had shared its past, find out what had worked and what hadn&#8217;t and the rationale behind certain design and software choices, we and anyone who came after us would be able to make considerable more progress than if they had to make the same mistakes all over again.</p>
<p>By the end of the semester, we had performed and transcribed a series of oral history interviews, exhausted a wiki, added to the archival imaging procedures and turned them into an illustrated manual, created a visual topology of the machine, and compiled an abbreviated/narrative version of the wiki into a project report. (Granted, I got lucky &#8211; my group was amazing!)  And even after all that, we all still felt that there was so much more to write down, so much more material to cover, so much more to do! The wiki had been essential for providing a space for us to jot down whatever tests we had run, whatever research we had done &#8211; a space to propose hypothesis about what went wrong and to figure out what needed to happen next. We tried so many different things that had we not documented all this as we worked, there would have been no way to reconstruct all the things we had done that failed (arguably the most valuable information for someone taking over the project), and all the places we had looked for answers. During this summer, I&#8217;ve been keeping track of questions and discoveries each day so that when I come in to work in the morning, yesterday&#8217;s questions propel me forward in my work.  And now, at the end, I can compile that into a tool for someone else to use to propel them forward. The iSchoolDH blog has also been a useful source of documentation for me. I&#8217;ve already referred pack to previous posts in the last few days to grab some info for the resources I&#8217;m currently compiling. So &#8211; documentation, good.</p>
<p>My time here has undoubtedly informed my understanding of what Digital Humanities is as well as forced me to think about its implications for many areas of scholarship. It has also stirred my imagination as far as the shape digital humanities might take at UT, where iSchoolers fit in to the equation, and how I might be involved in answering those questions.</p>
<p>I would like to say thank you to Keith Nickum, Programmer at CDRH, for all his help and patience. He is the creator of the Whitman Tracking application and has been a tremendous resource over the duration of my time here. Thank you to all the CDRH faculty and staff for making me feel at home in Lincoln.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ischooldh.org/2011/07/week-8/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>getting there with XSLT</title>
		<link>http://www.ischooldh.org/2011/07/getting-there-with-xslt/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=getting-there-with-xslt</link>
		<comments>http://www.ischooldh.org/2011/07/getting-there-with-xslt/#comments</comments>
		<pubDate>Wed, 20 Jul 2011 14:27:38 +0000</pubDate>
		<dc:creator>Molly Des Jardin</dc:creator>
				<category><![CDATA[CDRH]]></category>
		<category><![CDATA[Summer 2011]]></category>
		<category><![CDATA[University of Michigan]]></category>
		<category><![CDATA[functional]]></category>
		<category><![CDATA[problem-solving]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[recursion]]></category>
		<category><![CDATA[text analysis]]></category>
		<category><![CDATA[tokenx]]></category>
		<category><![CDATA[xslt]]></category>

		<guid isPermaLink="false">http://www.ischooldh.org/?p=327</guid>
		<description><![CDATA[(Note: This is from Molly Des Jardin at CDRH.) Now that I am, as the title implies, &#8220;getting there,&#8221; I want to reflect a little on the learning process that has been XSLT. In my last post I glossed over &#8230; <a href="http://www.ischooldh.org/2011/07/getting-there-with-xslt/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>(Note: This is from Molly Des Jardin at CDRH.)</p>
<p>Now that I am, as the title implies, &#8220;getting there,&#8221; I want to reflect a little on the learning process that has been XSLT. In my last post I glossed over what makes it (and functional programming languages generally) distinctive and, for people who are used to procedural languages, unintuitive and hard to grasp at first. This will be a post with several simple points, but that&#8217;s quite in keeping with the theme.</p>
<p>The major shift in thinking that needs to happen when working with XSLT, in my opinion, is one of <em>trusting the computer</em> more than we are accustomed to. It all stems from letting go of telling the computer how exactly to figure out when to execute sections of code, and letting it make the decisions for us.</p>
<p>I made a comment recently: &#8220;I know I&#8217;m getting more comfortable with XSLT because suddenly I&#8217;m trying to use recursion everywhere I can, and avoiding the for-loop like a crutch.&#8221; As others I talked to put it, this is idiomatic XSLT.<a href="#footnote1">*</a>; In other words, it&#8217;s one of the mental leaps that you (and I) have to make in order to start writing elegant and functional code (no pun intended) using this language.</p>
<p>What is recursion? In this case, to oversimplify, it&#8217;s how XSLT loops.<a href="#footnote2">**</a> In a procedural language &#8211; C++, Java, most languages other than Lisp dialects to be honest &#8211; recursion is clunky and wasteful; telling the computer to specifically &#8220;do this for the number of times I tell you, or until this thing reaches this state&#8221; is how you get things done. This means that the languages have state, too &#8211; you can change the value of variables. This is important for having counters that are the backbone of those loops. If there were no variable to increment or change in another way, the loop would either never execute (such as a while), only execute once, or loop endlessly. None of these things are very helpful.</p>
<p>So how do you get away with counter-based loop, at least of the &#8220;for each thing in this set&#8221; variety, with a stateless language (all variables are permanent, aka constants) that discourages use of for-each loops in the first place?</p>
<p>The first is much simpler: xsl:apply-templates or xsl:call-template. This involves the trust that I introduced above. With a procedural language it&#8217;s hard to trust the computer to take care of things without your telling it exactly how to do it (keep doing this thing until a condition is met) because you&#8217;ve had to become so used to it. It might have been hard to get used to having to explain the proverbial peanut butter sandwich recipe in excruciating detail for the sandwich to get made. Now, XSLT is forcing you to go back to the higher level of trust, where you can tell the computer &#8220;do this for all X&#8221; without telling it <em>how</em> it&#8217;s going to do that.</p>
<p>xsl:apply-templates simply means, &#8220;for all X, do Y.&#8221; (The Y is in the template.) It&#8217;s unsettling and worrying, at least for me at first, to just leave this up to the computer. There&#8217;s no guarantee that templates will ever be executed, or that they will be executed in order. How can I trust that this is going to turn out okay? Yet, with judicious application of xsl:apply-templates (like, where you want the results to be), it <em>will</em> happen.</p>
<p>Second, the recursive aspect. Keep calling the template until there are no more things left &#8211; whether that&#8217;s a counter, or a set of stuff. But how to get a counter without being able to change the variable? With each xsl:apply-templates (or call-template), do so with xsl:with-param, and adjust the parameter as needed. Call it with the rest of the set but not the thing that is being modified in the current template. When it runs out of stuff, that is when results are returned. Again, it takes the explicit instruction &#8211; xsl:for-each is very heavy-handed &#8211; and turns it into &#8220;if there&#8217;s anything left, keep on doing this.&#8221; It may seem from my description that there&#8217;s no real difference between these two, and in their end result, there isn&#8217;t. But this is a big leap, and moving from instinctively reaching for xsl:for-each to xsl:apply-templates is conceptually profound. It is getting XSLT.</p>
<p>Finally, a note on the brevity and simplicity of XSLT. I&#8217;ve noticed that once I&#8217;ve found a good, relatively elegant solution to what I&#8217;m trying to do (they can&#8217;t always be!), suddenly my code becomes very short and very simple. It&#8217;s not hard to write and I don&#8217;t type for a long time. It&#8217;s the thinking and planning that takes up the time. Obviously this is true for programming just about anything, but I find myself doing a whole lot less typing this summer than usual (compared to languages I&#8217;ve used such as C, C++, Java, Python).</p>
<p>It&#8217;s both satisfying and disappointing at the same time: getting a template that recursively creates arbitrary nested menus wants to make me jump up and high five myself; the fact that it&#8217;s only about four lines and incredibly simple makes me wonder if any of it was that hard to begin with. But this isn&#8217;t limited to XSLT or even programming: the 90-page thesis seems like more work than the 40-page thesis, but if the shorter one is talking about more profound ideas and/or is simply more well-written, the length and time comparison falls apart. The time spent typing and the length of the output doesn&#8217;t tell us as much as we&#8217;re used to assuming.</p>
<p>That&#8217;s what I have to say about what I&#8217;ve been doing this summer, as far as learning XSLT goes. I still can&#8217;t say I like it. The syntax is maddening. I haven&#8217;t been in this long enough to judge whether it&#8217;s the best choice for getting something done within a lot of constraints. But at the very least I&#8217;ve finally had that brain shift again, the one I had with Lisp so long ago, to a different approach to problem-solving entirely. And that feeling is profoundly gratifying.</p>
<p>Speaking of a good feeling, I&#8217;ve been able to have extended chats with multiple people about XSLT on the U of M School of Information mailing list this summer after someone posted asking for help with it. It&#8217;s a good thing I replied despite thinking &#8220;I&#8217;m not an expert, so I probably don&#8217;t have much to offer.&#8221; Talking with the questioner and the others who replied-all on our emails was really enlightening, both by getting feedback, hearing others&#8217; questions about how the language works (questions that I hadn&#8217;t articulated very well), and also giving my own feedback. There&#8217;s nothing like teaching to help you learn. I would not have been able to write this post before talking to my fellow students and figuring it out together. (Or, you would have read a very unclear and aimless post.)</p>
<p>(Very last, I&#8217;d like to recommend the O&#8217;Reilly book <i>XSLT Cookbook</i> for using this language regularly after getting acquainted with it. If I were continuing on with an XSLT project after this internship, or working on adding more to this one, I&#8217;d be using this book for suggestions.)</p>
<p><a name="footnote1">* Thank you all for reminding me that this word exists.</a></p>
<p><a name="footnote2">** XSLT now includes not only the for-each loop, but also the xs:for tag. These do have their appropriate uses and I do use them quite a lot, because my application doesn&#8217;t give me a huge number of chances for recursion. I&#8217;m being dramatic to make a point.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.ischooldh.org/2011/07/getting-there-with-xslt/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Week 7</title>
		<link>http://www.ischooldh.org/2011/07/week-7/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=week-7</link>
		<comments>http://www.ischooldh.org/2011/07/week-7/#comments</comments>
		<pubDate>Sun, 17 Jul 2011 23:52:53 +0000</pubDate>
		<dc:creator>j.meyerson</dc:creator>
				<category><![CDATA[CDRH]]></category>
		<category><![CDATA[Summer 2011]]></category>
		<category><![CDATA[University of Texas]]></category>

		<guid isPermaLink="false">http://www.ischooldh.org/?p=376</guid>
		<description><![CDATA[&#8216;Digital humanities&#8217; and &#8216;digital scholarship.&#8217; To many, this distinction may seem pointless  or premature but I’ve been struggling to articulate what that distinction is. Why? Because after bookmarking a ton of sites, (anywhere from Centers’ &#38; Studios&#8217; project websites to &#8230; <a href="http://www.ischooldh.org/2011/07/week-7/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>&#8216;Digital humanities&#8217; and &#8216;digital scholarship.&#8217; To many, this distinction may seem pointless  or premature but I’ve been struggling to articulate what that distinction is. Why? Because after bookmarking a ton of sites, (anywhere from Centers’ &amp; Studios&#8217; project websites to the personal &amp; professional blogs of digital humanists), I’m finding that the range is so broad in humanities computing,  that a distinction is called for. Besides, scholarship evokes something specific, namely – finding an answer to a question or resolving a contradiction using the dialectical method.  Digital scholarship, in my opinion, should maintain its dialectic character but should also be experimental with the digital format in terms of how it establishes and, more importantly for this post, propels that dialog forward.</p>
<p>I think one reason that traditional, print-based, scholarship is hard to move away from – why it&#8217;s hard not to make it the baseline for evaluation other forms of scholarship (performance, scholarly digital experimentation, etc.) is that its evaluation process helps it to fit quite nicely within the parameters set by definitions of the dialectical method. According to Wikipedia, &#8220;Scholarly peer review is the process of subjecting an author&#8217;s scholarly work, research, or ideas to the scrutiny of others who are experts in the same field [that presumably hold different viewpoints about a subject], before a <em>paper</em> describing this work is published in a journal. The work may be accepted, considered acceptable with revisions, or rejected. Peer review requires a community of experts in a given (and often narrowly defined) field, who are qualified and able to perform impartial review.&#8221;</p>
<p>I would argue that projects like <a href="http://whitneyannetrettien.com/thesis/">http://whitneyannetrettien.com/thesis/</a> are the part of the future of digital scholarship and I also think calling this example work &#8216;scholarship&#8217; makes some feel uncomfortable. Why? Does it make a reasoned argument? I think so, yes. Does the format and the content imply an invitation for dialog? I think so, yes.  There aren’t many things like Treitten’s that I’ve found. And I’m not saying that I think she’s the greatest writer or that her arguments are mind-blowing or mabye they are  - the argument itself is not the point. The point is that she presents a reasoned argument, that the presentation is visually &amp; functionally creative, and that her platform is the web &#8211; what&#8217;s not dialectical about that combination?</p>
<p>So mabye the question is &#8216;dialog with whom?&#8217; Is it speaking to a narrow field of qualified experts? What field? History of Science? Humanities computing? Computer science? Literature? If we aren’t sure what field to classify it under, (partly due to the platform – are we readers or are we users), then who is qualified to review it – who is invited to participate in the dialog that gets at the truth of the matter? And does the scholarly peer review process fail when you check the box that says ‘all of the above.’</p>
<p>My point is: We should allow digital scholarship to mean something fundamentally different and its evaluation process should reflect that. Alternatives to peer review and peer review in electronic publishing have been written about for a more than a decade and are still being written about, but still in the context of the journal article.*(1, 2, 3)</p>
<p>Okay, so what do I think digital scholarship should look like: well, first I would say that future of digital scholarlship should <em>look</em> a little more like what Trettien is doing (ie, not just an essay, not just a website, not just a visualization, not just a performance, it’s all of these things on the same page) than say, an article in First Monday, (although First Monday is a consistent source of engaging scholarship about the most relevant topics in the feild of information studies and beyond). But aside from that, it needs<em> its own</em> evaluation process - negotiated by people doing the work. The idea of a process being negotiated by the digital scholars themselves, instead of by the scholarly community at large is most obviously because they have a vested interest in making sure that the field continues to gain respect/scholarly heft or whatever else you want to call it. Another reason I&#8217;ll offer up is that a process can be adapted to work in different contexts but sometimes even after investing considerable overhead in &#8216;making it work&#8217;, it can still fall short of doing what it&#8217;s intended to do well.</p>
<p>Excuse my ramblings. This is an ongoing thought exercise for me.</p>
<p>*1.Fitzpatrick, K. (2010). Peer-to-peer Review and the Future of Scholarly Authority. <em>Social Epistemology</em>, 24(3), 161-179. doi:10.1080/02691728.2010.498929</p>
<p>2.First Monday, Volume 4, Number 4 &#8211; 5 April 1999,<em><a href="http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/661/576">Scholarly Publishing, Peer Review and the Internet</a> </em>by Peter Roberts</p>
<p>3.Differences &amp; Repetitions Wiki, August 25, 2010, <em><a href="http://www.diffandrep.org/wiki/?q=performing-scholarly-communication">Performing Scholarly Communication</a> b</em>y Ted Striphas</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ischooldh.org/2011/07/week-7/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Further adventures in topic modeling</title>
		<link>http://www.ischooldh.org/2011/07/further-adventures-in-topic-modeling/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=further-adventures-in-topic-modeling</link>
		<comments>http://www.ischooldh.org/2011/07/further-adventures-in-topic-modeling/#comments</comments>
		<pubDate>Sat, 16 Jul 2011 01:23:20 +0000</pubDate>
		<dc:creator>Sayan Bhattacharyya</dc:creator>
				<category><![CDATA[MITH]]></category>
		<category><![CDATA[University of Michigan]]></category>

		<guid isPermaLink="false">http://www.ischooldh.org/?p=328</guid>
		<description><![CDATA[I realize that I hadn&#8217;t properly introduced myself to readers of the blog in previous postings here at the iSchools-DH blog. (I am Sayan Bhattacharyya, by the way  &#8212; I&#8217;m mentioning my name here as in the body of the &#8230; <a href="http://www.ischooldh.org/2011/07/further-adventures-in-topic-modeling/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I realize that I hadn&#8217;t properly introduced myself to readers of the blog in previous postings here at the iSchools-DH blog. (I am Sayan Bhattacharyya, by the way  &#8212; I&#8217;m mentioning my name here as in the body of the post as currently this blog doesn&#8217;t seem to be displaying posters&#8217; names next to posts.) As my WordPress profile states:</p>
<p>&#8220;I grew up in India, was trained originally as an engineer, later did a PhD in Comparative Literature at the University of Michigan, and now am a master&#8217;s student in the School of Information there. I love riding bicycles.&#8221;</p>
<p>I think that pretty much sums me up in a nutshell, and I should add that I have been riding my bike to MITH since a couple weeks after arriving here. The weather here in College Park, Maryland has been very nice for riding a bike so far.</p>
<p>This past week has been an interesting one. I finally finished up (with much general help from Travis, who works here at MITH) a change that I had been making to the existing topic modeling code in Scala, which will give the reader a little more flexibility and a richer understanding about the topics discovered by the algorithm. This change itself wasn&#8217;t conceptually very complex at all, but it was a very useful learning experience for me in terms of acquiring the skills to program in Scala. Although I had programmed in a functional language in the past, Scala is quite a different cup of tea because aspires to both be a better Java and be a full-fledged functional language, which makes its syntax interestingly complex.</p>
<p>Something that I have been thinking about since I started the internship, has been the question of how to &#8220;theorize&#8221; topic models to a humanities audience. Unlike, say, material artifacts such as hard drives (about which a faculty here at MITH, Matthew Kirschenbaum, has written very interestingly), such things as topic models are very abstract concepts. It is simply difficult to talk about them with people who are not programmers or mathematicians or statisticians. So, when talking to humanists about what topic modeling is (see my previous post for earlier reflections about this), some careful thought is required. In situations like this, I think, digital humanities has the potential to forge interesting connection with literary theory, and use the language of theory in the humanities (with which non-digital-humanists all do have at least a fair grasp) to &#8220;theorize&#8221; what it is that is happening in such things as topic modeling. An interesting thing that happened this week was that I received, on the University of Michigan School of Information mailing list, an announcement about a conference in the English department back in UM at the end of September, organized by the Rhetorical Studies group at UM, for which the organizers are soliciting presentations from people working in corpus analysis as well as more traditional scholars of rhetoric. We wrote up a proposal yesterday for the conference, and if the submission is accepted, we may be talking in September about how techniques like topic modeling, when applied to the interpretation of text, stand in relation to longstanding fields of inquiry in the humanities that have concerned themselves with interpretation and hermeneutics.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ischooldh.org/2011/07/further-adventures-in-topic-modeling/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A belated introduction</title>
		<link>http://www.ischooldh.org/2011/07/a-belated-introduction/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=a-belated-introduction</link>
		<comments>http://www.ischooldh.org/2011/07/a-belated-introduction/#comments</comments>
		<pubDate>Thu, 14 Jul 2011 21:30:33 +0000</pubDate>
		<dc:creator>Molly Des Jardin</dc:creator>
				<category><![CDATA[CDRH]]></category>
		<category><![CDATA[Summer 2011]]></category>
		<category><![CDATA[University of Michigan]]></category>
		<category><![CDATA[css]]></category>
		<category><![CDATA[html]]></category>
		<category><![CDATA[introduction]]></category>
		<category><![CDATA[nebraska]]></category>
		<category><![CDATA[tokenx]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[xslt]]></category>

		<guid isPermaLink="false">http://www.ischooldh.org/?p=325</guid>
		<description><![CDATA[(This is Molly Des Jardin at CDRH.) Hello readers, and to the rest of the summer 2011 interns! My name is Molly and I&#8217;m working at CDRH this summer in Nebraska, mainly on redesigning and implementing a new interface for &#8230; <a href="http://www.ischooldh.org/2011/07/a-belated-introduction/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>(This is Molly Des Jardin at CDRH.)</p>
<p>Hello readers, and to the rest of the summer 2011 interns!</p>
<p>My name is Molly and I&#8217;m working at CDRH this summer in Nebraska, mainly on redesigning and implementing a new interface for TokenX, a text analysis tool developed by Brian Pytlik Zillig. If you&#8217;d asked me at the beginning of the summer what I thought about working on a project this big using XSLT 90% of the time, I&#8217;d have said &#8211; &#8220;those are just stylesheets for XML, right? bring it on.&#8221;</p>
<p>It&#8217;s not like I was <i>wrong</i> per se. XSLT stylesheets are just that, but I&#8217;m finding that the term is vague to the point of being entirely meaningless. Does &#8220;stylesheet&#8221; help you understand what CSS is? (Then again, does &#8220;cascading&#8221; make it all make sense? No.) But because this is the first place I encountered that word, here I was thinking XSLT is the equivalent of CSS for XML. Yeah.</p>
<p>It&#8217;s totally not. If I could send a message back to myself on May 11, when I started my internship here, I&#8217;d yell &#8220;it&#8217;s a functional programming language with syntax designed by a sadist &#8211; don&#8217;t you wish you&#8217;d kept up with Lisp now?&#8221; Yes, I do wish.</p>
<p>So the entries you&#8217;ll get from me for the rest of the summer will spare you from my slow learning what exactly what&#8217;s going on, and take you instead right into trying to coax my understanding of XSLT, along with TokenX&#8217;s original code, into forming something that makes sense <i>and</i> works. It&#8217;s a tall order but I refuse to come up short.</p>
<p>When I&#8217;m not persevering in the face of the learning curve, I get to spend time with TokenX&#8217;s new HTML5 site, learning about what everyone else is doing at the center, checking out DH2011, and learning about linked data in Ireland. It&#8217;s a good summer.</p>
<p>I&#8217;ve come to this internship with what they call a non-traditional background, so I&#8217;ll give you a little bit about myself. For the past 6 (yes, 6) years I have been at the University of Michigan, working my way through a PhD program in Japanese literature (with my thesis turning more toward book history), and since 2008, an MSI in Library &#038; Information Services at the School of Information. I never thought I&#8217;d get to say these words, but things are coming to an end: I&#8217;ll graduate with an MSI in December 2011, and with my PhD in 2012. What then? Digital humanities, of course!</p>
<p>Until last year, I didn&#8217;t know that this field was out there, but I&#8217;d been doing projects (and research) that could be described as &#8220;DH&#8221; for years. In fact, as an undergraduate I juggled the courseloads of both a BS in computer science and BA in history (Japanese of course). When I came to graduate school, it never occurred to me that I could both program and write a dissertation on 19th century books &#8211; you know, without doing two PhDs. I&#8217;m thrilled to find a place that I can actually do both without having to give up either one.</p>
<p>Stay tuned for more on the fun of using XSLT to do something large and complicated, and for some of the many blog posts in my backlog that were inspired by DH2011 and the DHO summer school&#8217;s metadata/linked data workshop from July.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ischooldh.org/2011/07/a-belated-introduction/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Week 6</title>
		<link>http://www.ischooldh.org/2011/07/week-6/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=week-6</link>
		<comments>http://www.ischooldh.org/2011/07/week-6/#comments</comments>
		<pubDate>Sun, 10 Jul 2011 18:27:40 +0000</pubDate>
		<dc:creator>j.meyerson</dc:creator>
				<category><![CDATA[CDRH]]></category>
		<category><![CDATA[Summer 2011]]></category>
		<category><![CDATA[University of Texas]]></category>

		<guid isPermaLink="false">http://www.ischooldh.org/?p=367</guid>
		<description><![CDATA[So, in my first posts, I briefly introduced the tools I would be using to work on the database: PHPMyAdmin (PMA), MySQLWorkbench (MSW), NetBeans &#8211; with just a few more weeks left, I wanted to take the time to reflect &#8230; <a href="http://www.ischooldh.org/2011/07/week-6/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[
<a href='http://www.ischooldh.org/2011/07/week-6/miniature-workench-mysql/' title='miniature-workench-mysql'><img width="150" height="150" src="http://www.ischooldh.org/wp-content/uploads/2011/07/miniature-workench-mysql.png" class="attachment-thumbnail" alt="miniature-workench-mysql" title="miniature-workench-mysql" /></a>
<a href='http://www.ischooldh.org/2011/07/week-6/phpmyadmin/' title='phpmyadmin'><img width="150" height="150" src="http://www.ischooldh.org/wp-content/uploads/2011/07/phpmyadmin-150x150.gif" class="attachment-thumbnail" alt="phpmyadmin" title="phpmyadmin" /></a>

<p>So, in my first posts, I briefly introduced the tools I would be using to work on the database: PHPMyAdmin (PMA), MySQLWorkbench (MSW), NetBeans &#8211; with just a few more weeks left, I wanted to take the time to reflect on my experience using a couple of them.</p>
<p>PMA is a web interface that is used to make changes on a live database. You can create, edit, drop, populate and track the tables in your DB. There are a handful of additional, useful features that are offered by PMA but the user has to add those settings in to their config.inc.php, (and they don’t notify the user when you first set up an account about these features and how to enable them).The WAMP setup I downloaded came with the lean config settings in comparison to the PHPMyAdmin user account for the development server at CDRH. Additional functionality enabled by setting up PMA configuration storage, (and the feature which has been most helpful in my work), is the designer/relational view. The designer view creates a graphical representation of the tables in which relationships between tables are shown using colored lines from a primary key to its use as a foreign key in another table. You can manipulate the page and arrange the tables around for optimal viewing. You can also collapse and expand the table fields. The default overview you get without PMA configuration storage is the Data Dictionary. This page lists all the tables and their fields, however, it runs linearly down the page in alphabetical order so relationships are not as apparent without careful scanning up and down the page, (which depending on how many tables you have, could mean lots of scrolling).  PMA is a nice tool for making one or two small changes to a live database. However, trying to use the interface to do any more than that is a slow and frustrating process. Every minor change executes a query and so there is the time that it takes to execute and then the time that it takes for the page to reload.</p>
<p>According to its About page, “ MySQLWorkbench is a cross-platform, visual database design tool developed by MySQL.”  I had used MSW before coming to CDRH but without exploring it’s abilities to backwards and forwards engineer. (It also allows you to connect to the server that hosts your MySQL install and perform server administration from within the MSW interface.) For my first design iteration for the Cody tracking DB, I built a model using MSW, (since no queries are actually being executed it’s much faster &#8211; also, this way I had a visual representation of my schema to compare side-by-side with the original Whitman tracking design view in PMA), and then wrote the CREATE script for PMA myself – pretty time intensive considering it was bound to change in significant ways over the next few weeks. Lesson learned, for my next iterations, I simply dropped tables and re-wrote the CREATE scripts for individual changes. (I know, I know.) Finally, Keith explained to me that if I simply export the database in its current form as a .sql file and import it into MSW, I can reverse engineer to auto create an ERR diagram based on that schema. I can make the changes there and then forward engineer the diagram inot a new CREATE script. Lovely, right? Well, only if you don’t have any mistakes. If MSW finds a mistake, (Ex. You have a primary key set to UNSIGNED and then where it appears as a foreign key in another table, it is set to signed), it will ignore it and do its best to produce a CREATE script anyway, sending the errors along with it. Also, the MSW syntax in the sql script creates a foreign key constraint, leaving the parenthesis empty and not specifying the table that the foreign key references. Also, the application has a tendency to crash and the recovery mode,( that shows any files with changes that it has in memory but weren’t saved before shut down, like Microsoft Word), is unreliable.</p>
<p>Both of these tools have been of great use to me and although I have had my issues with them, (at times due to my own ignorance of certain settings or features), I am grateful to have these design/editing tools for use at no cost. My CDRH project has given me the opportunity to become more proficient at using these tools as well as introducing me to other resources that help decrease time spent on repetitive project tasks.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ischooldh.org/2011/07/week-6/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Winning the trust of &#8220;humanities-people&#8221; through better visualization</title>
		<link>http://www.ischooldh.org/2011/07/winning-the-trust-of-humanities-people-through-better-visualization/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=winning-the-trust-of-humanities-people-through-better-visualization</link>
		<comments>http://www.ischooldh.org/2011/07/winning-the-trust-of-humanities-people-through-better-visualization/#comments</comments>
		<pubDate>Wed, 06 Jul 2011 09:06:49 +0000</pubDate>
		<dc:creator>Sayan Bhattacharyya</dc:creator>
				<category><![CDATA[MATRIX]]></category>
		<category><![CDATA[Summer 2011]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[University of Michigan]]></category>

		<guid isPermaLink="false">http://www.ischooldh.org/?p=304</guid>
		<description><![CDATA[&#160; This is Sayan Bhattacharyya again, making his second blog post. As you may remember from my previous blog post, I am a grad student from the University of Michigan interning this summer at MITH, working on the topic modeling &#8230; <a href="http://www.ischooldh.org/2011/07/winning-the-trust-of-humanities-people-through-better-visualization/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>&nbsp;</p>
<p>This is Sayan Bhattacharyya again, making his second blog post. As you may remember from my previous blog post, I am a grad student from the University of Michigan interning this summer at MITH, working on the topic modeling project that is underway here. In this post, I will describe the &#8220;what&#8221; and &#8220;why&#8221; of what I have been doing, and  I will try to put it in the wider context of digital humanities.</p>
<p>Travis Brown and others have developed Woodchipper, a visualization tool which runs the Mallet package developed at the University of Massachusetts at Amherst to perform topic modeling on a selected corpus, and then displays the results of a principal-component analysis. An attractive feature of Woodchipper is that it is oriented towards &#8220;drilling down&#8221; &#8212; a concept that is particularly relevant to the digital humanities. Those of us who &#8220;do&#8221; humanities pride ourselves on being <a href="http://en.wikipedia.org/wiki/Close_reading">close readers</a> of texts. To be appealing to humanists, then, topic modeling, in so far as we can think of it as a method of &#8220;<a href="http://mikejohnduff.blogspot.com/2009/11/distant-reading.html">distant reading</a>&#8220;, will need to be combined with close reading. The humanist scholar or researcher who uses an application like Woodchipper would typically want to switch (&#8220;drill down&#8221;) from the &#8220;distant&#8221; view provided by the results generated by the topic model, to individual texts, and the application should provide the affordance necessary for this kind of switching of scale to take place at a mere mouse-click. Woodchipper does this by displaying each page of a text as a clickable data point on a two-dimensional graph, the spatial layout of the graph having been shaped by the results of the principal component analysis.</p>
<p>Once the user clicks on a data point (i.e. on a selected page from a given document), that is when the connection/relationship between the &#8220;distant&#8221; aspect of the text&#8217;s high-level attributes &#8212; its &#8220;topics&#8221; &#8212; connect with the &#8220;close&#8221; aspects the text &#8212; its individual words. Visualization of this relationship is, therefore, the crucial part of this model. The challenge this poses for the visualization is the following: why should the researcher trust the high-level attributes that the model tells her that the text supposedly has? If the visualization can bridge this gap between the high-level and text-level attributes of the text by clearly displaying the connection/relationship between the levels, then and only then will the user be likely to trust the high-level properties discovered by the topic model.</p>
<p>As I mentioned in my last blog post, this is why we decided to work on making the visualization more expressive and richer, so that it could convey more adequately the relationship between the high-level topics discovered by the algorithm for a given page in a text, and the actual contents of that page itself. Earlier, Woodchipper displayed only a specified number of topics that had been adjudged by the algorithm to be the best topics for the page, with the visualization representing each topic by a list of the first few words that were the most representative of that topic. As mentioned in the last blog post, we changed this to incorporate a pie chart which showed how relevant each of these topics were for the page. The idea was to then highlight those words in the page that &#8220;belonged&#8221; to a topic, with the same color as the color of that topic in the pie chart. However, we then realized that it was not possible to trace out this connection between topics and actual words in the document, without making further changes to the code in order to refine how we represent topics. A topic is a probability distribution over words, and hence to simply represent a topic as a list of a few selected words is misleading, because, even if those selected words represent the highest-probability words in that topic, the actual probability mass represented by each word in that topic may be quite different. It would be more logical and more expressive, therefore, to represent a topic by those words which, together, add up to a certain specified fraction of the total probability mass. Doing so necessitates changing the Scala code on the server side, which furnishes these words, before the Woodchipper client accesses it.  Instead of passing the top x number of words for that topic to the client, we would now be passing to the client the top words (however many they may be) that together add up to a specified fraction of the total probability mass,  together with the probability of each word (for that particular topic).</p>
<p>We also realized that a further change needed to be made. Each page was too small in size, so that, very often, no word in the page actually matched the top words in the topics for that page. We realized that we probably needed to break up the documents into larger sized units than merely physical pages, in order to be able to show to the user a more trustworthy picture of how the top-level (&#8220;topics&#8221;) meshes and connects with the bottom-level (&#8220;words on a specific page&#8221;) when we metaphorically &#8220;drill down&#8221; from top to bottom.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ischooldh.org/2011/07/winning-the-trust-of-humanities-people-through-better-visualization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Learning Woodchipper</title>
		<link>http://www.ischooldh.org/2011/07/first-blog-post/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=first-blog-post</link>
		<comments>http://www.ischooldh.org/2011/07/first-blog-post/#comments</comments>
		<pubDate>Tue, 05 Jul 2011 20:53:43 +0000</pubDate>
		<dc:creator>Clay Templeton</dc:creator>
				<category><![CDATA[MITH]]></category>
		<category><![CDATA[Summer 2011]]></category>

		<guid isPermaLink="false">http://www.ischooldh.org/?p=286</guid>
		<description><![CDATA[Hello World My name is Clay Templeton, and I’ve come to MITH from the University of Maryland’s iSchool, just across the McKeldin Mall.  This is my first blog post inside the iSchools and Digital Humanities space. At MITH, I’m working &#8230; <a href="http://www.ischooldh.org/2011/07/first-blog-post/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><strong>Hello World </strong></p>
<p>My name is Clay Templeton, and I’ve come to MITH from the University of Maryland’s <a href="http://ischool.umd.edu/">iSchool</a>, just across the McKeldin Mall.  This is my first blog post inside the iSchools and Digital Humanities space.</p>
<p>At MITH, I’m working with R&amp;D Software Developer Travis Brown.  Together, we’re developing topic modeling applications in the digital humanities, evolving the <a href="http://mith.umd.edu/corporacamp/tool.php">Woodchipper</a> code base and bringing related topic modeling approaches into the field.</p>
<p>This post can be read as exegesis to Travis’s page on Woodchipper.  It reflects my evolving understanding, and hopefully stimulates further thinking.</p>
<p><strong>A Few Technical Details</strong></p>
<p>Woodchipper is written in <a href="http://www.scala-lang.org/">Scala</a>, using the <a href="http://liftweb.net/">Lift library</a> for Web development.  As a user operates Woodchipper, Scala and Lift post Web search form input to a server side database and retrieve lists of texts for possible inclusion in the visualization.  After the user selects texts for inclusion, the application consults a database of structural elements (populated in advance) and performs calculations to produce the visualizations.</p>
<p>Scala and Lift are new to me.  As I work through the code, I&#8217;m developing a semi-narrative account of what happens inside the application when a user requests visualization.  This could provide a bouncing off point for extensive documentation down the line, the kind of documentation that might be helpful in a formal release of version 1.0 code.  Anticipating this eventuality, Travis and I are also keeping a log of what needs to be changed and rearranged inside the code.</p>
<p><strong>What Woodchipper Does: Shredding Corpora and Building Structure</strong></p>
<p>Woodchipper shreds corpora into words and synthesizes abstract elements of hidden structure, so that each text is reconceived as a mixture of abstract structural elements.  The structural elements that Woodchipper posits have no figuration, but are represented by lists of words.  The words that comprise each structural element are identical; each list includes all the words found in the corpora.  The lists differ in the order of words.  Order is determined by how surprised we should be to see a word if we know that its appearance is motivated by the structural element in question.</p>
<p>At the same time that Woodchipper develops its structural theory of the corpus, it credits each token in each text to one of the structural elements it creates.  In fact, the proposed structure and the attribution of tokens to structural elements are adjusted iteratively to approach the most mathematically plausible arrangement.  Each text is thus plottable in a multidimensional space whose dimensions correspond to structural elements.  The magnitude along each dimension is simply the fraction of tokens credited to that structural element.</p>
<p>All these dimensions of structure are unwieldy.  To visualize the corpus, Woodchipper chooses an optimal two-dimensional coordinate plane, in effect rotating the multidimensional space so that the viewer’s gaze is positioned to register the maximum possible difference.</p>
<p>This promise of emergent structure, free from human interference, is almost certainly what makes Woodchipper so appealing.  It’s true:  Woodchipper escapes literary, philosophical, and sociological categories via appeal to mathematical probability, iconoclastically inducing structure based on patterns of word co-occurrence alone. The assumptions involved are minimal, if imperfect: that spaces (“ “) delimit units of meaning; that orthographic identity implies semantic identity; and that each token in a text is generated, probabilistically, from a finite set of structural elements.  Given these assumptions and a corpus, the technique of “<a href="http://portal.acm.org/citation.cfm?id=944937">topic modeling</a>” performs the computationally heavy lifting.</p>
<p>Human intelligence must be brought into the loop eventually to figure out the critical account Woodchipper seems to be proposing.  To facilitate human involvement, Woodchipper projects both the texts and the original structural dimensions onto the same coordinate plane, which permits direct comparison between text and structure.  It&#8217;s up to the researcher to assign significance to the abstract structural elements.  This work is typically accomplished with reference to the word lists comprising each structural element, the graphs linking structure to text, and the texts themselves.  In Travis’s example, texts are drawn from Jane Austen’s Emma and Byron’s Don Juan.  Structural elements are interpreted in light of their relation to texts on the coordinate plane chosen by Woodchipper.  Assigned an identity, structural elements shed light on major differences and similarities between the corpora.</p>
<p><strong>Next Blog Post</strong></p>
<p>In my next blog post, I expect to discuss advanced topic modeling approaches available for import into the digital humanities, with special attention to continuity among existing MITH projects and other established work within the field.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ischooldh.org/2011/07/first-blog-post/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Week 5</title>
		<link>http://www.ischooldh.org/2011/07/week-5/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=week-5</link>
		<comments>http://www.ischooldh.org/2011/07/week-5/#comments</comments>
		<pubDate>Tue, 05 Jul 2011 12:07:08 +0000</pubDate>
		<dc:creator>j.meyerson</dc:creator>
				<category><![CDATA[CDRH]]></category>
		<category><![CDATA[Summer 2011]]></category>

		<guid isPermaLink="false">http://www.ischooldh.org/?p=345</guid>
		<description><![CDATA[This was a week full of Digital Humanities introductions and opportunities. It started with Tuesday&#8217;s observation of Whitman Camp which included folks from all participating institutions including Ed Folsom of the University of Iowa, Ken Price of UNL, Liz Lorang of UNL, Matt &#8230; <a href="http://www.ischooldh.org/2011/07/week-5/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.ischooldh.org/2011/07/week-5/1953193944_small-2/" rel="attachment wp-att-346"><img class="alignleft size-full wp-image-346" title="1953193944_small" src="http://www.ischooldh.org/wp-content/uploads/2011/07/1953193944_small1.jpg" alt="" width="150" height="68" /></a>This was a week full of Digital Humanities introductions and opportunities. It started with Tuesday&#8217;s observation of Whitman Camp which included folks from all participating institutions including Ed Folsom of the University of Iowa, Ken Price of UNL, Liz Lorang of UNL, Matt Miller from Yeshiva University, Matt Cohen from University of Texas, Brett Barney of UNL, and Brian Pytlik Zillig of UNL among others.</p>
<p>It was edifying to see how a dispersed working group of scholars gets together and makes decisions, how they structure their time together, what the major points of contention are and what new directions they envision for the project. There was a lengthy discussion about info architecture on the main page of the archive. It would seem, particularly as an iSchooler, that those kinds of decisions might be made more suitably, by an information architect, or (considering the digital format), offer the user several different ways of organizing the information on the site. On the other hand, this is a scholarly editing project and as such the way that information is presented to the user, including the placement of links on the page and the titles of browsable categories, are editorial decisions that reflect the expertise and rhetorical goals of the working group. It is also important to mention that, present in that discussion was a conscientiousness among members about hearkening back to a print culture approach to publishing on the web but instead to embrace the affordances of the digital publishing as a fundamentally different practice.</p>
<p>There were also discussions about recent content that has been partially processed, (i.e, scanned, transcribed, translated, etc.), but not yet encoded. For example, t<span style="font-family: Arial, sans-serif;">here is a large group of a subset of Whitman materials that are in PDF format. They were digitized in the early days of the archive and existed as high quality TIFF files that were then converted into PDFs. Some of the TIFF images were OCRed and converted into a PDF with an invisible overlay of the OCRed text that can be harvested as plain text and reformatted. However, many of these items were not OCRed and exist only as images. The archive is dedicated to encoding all documents that go up but with this particular image-only PDF subset, there would have to be retrofitting amidst a backlog of recently transcribed documents of some length  that also need to be encoded. Brian introduced EXist&#8217;s capability of allowing for much more complex JQuery searches across XML documents, however, it is an XML database and therefore requires all content to be in XML format. </span></p>
<p><span>The group decided not to retrofit the entire sub-collection of items, but took the opportunity to add OCR to the workflow for all similar items in the future. It&#8217;s possible that using one of Brian&#8217;s tools, they may be able to harvest the text from the searchable PDFs and convert them into TEI documents en masse. This brings up another point - decisions made concerning implementation of new technologies, or things that fundamentally change how content is organized and eventually served up to the user, have to be made with an eye on a sustainable future &#8211; and that also means consideration of the learning curve for new people who are coming in and will be charged with maintenance and improvement of the current system.</span></p>
<p>I had lunch with Matt Cohen and Nicole Gray. It was a great opportunity to hear about Matt and Nicole&#8217;s experience doing DH work at UT and which persons back home have been involved in the shape that it is taking there. Matt shared a some history on TACC, the Texas Advanced Computing Center. Interestingly, just a couple of years ago, they hired a Humanities liaison to outreach to disciplines that, historically, have not embraced a quantitative/computing approach to research. It will be interesting to delve more deeply into the response to that outreach effort when I get back to Texas.</p>
<p>It was a good week for me. Observing. Taking notes. Seeing a dispersed scholarly working group in action. Discussing the possibilities and grander visions for the future of DH at my university. And it ended with a bang. The sound of fireworks began on Thursday and didn&#8217;t end until very late on Monday night. It was really something.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ischooldh.org/2011/07/week-5/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

