Preparing Files for Text & Data Mining

Wednesday, February 13, 2019 at 10:30 AM until 11:20 AMEastern Standard Time UTC -05:00

CDS, Hesburgh Library 2nd Floor, Consultation Room 247
United States

Text mining, a process for extracting information from unstructured text, requires everyday files (PDF, Word, HTML, etc.) to be transformed into plain text files. Once your files are in a plain text format (no bold, no italics, no underlining, etc.) they are ready for automated processing and computer analysis.

This hands-on workshop will demonstrate and facilitate the use of a free Java-based program called Tika to do this work.

More specifically, this workshop will help attendees install Tika and use it to convert just about any file into plain text, and then participants will be empowered to use a myriad of text mining services available on the internet.

Registration is no longer available because the registration deadline has passed.