↕ Uploading text files

↕ Uploading text files

Uploading multiple text files, doc, docx, or pdf files

Unstructured documents

With some kinds of import, the exact structure matters a lot. But with text import, you can import documents with absolutely no structure if you wish. Each document is imported as one source. If you want you can provide additional metadata (e.g. year of publication) by also uploading an Excel sheet with this additional data.

Structured documents: question ID

If you wish, and if it makes sense to do so, you can also structure your documents into sections which we call questions, with a question ID for each section.
The app can then use the question ID to help you analyse your data, e.g., to filter only for answers to a particular question.
Note that as usual the IDs must be simple alphanumeric labels with no spaces.
question_id: QH (Optionally some more text here which is not part of the ID)
Yes, my health is better overall …
question_id: QJ (Optionally some more text here)
The floods affected my income…
It does not matter that every question appears inside each document, or even if some questions only appear only in one document.
Note that all the text following a section header will be included as part of that question, until the next question ID or the end of the document.
Note that if you for some reason have the same question ID repeated in the same document, the text will be merged.
 

Structured documents: source ID

Usually we assume that one document comes from one source. But sometimes you might have sources mixed within longer documents.
If you wish, and if it makes sense to do so, you can also structure your documents by source_id.
 
source_id: M23 (Optionally some more text here)
My health is better overall …
source_id: M24 (Optionally some more text here)
But my health is worse…
This means that for each text file, where a statement does not have a source, the document title will be used as source ID, otherwise the source ID of the statements is provided by these in-document tags.
 

Structured documents: question ID and source ID

If you wish, and if it makes sense to do so, you can also structure your documents into two levels of structure:
  • source_id
  • question_id
and the app expects questions to be nested inside respondents like this:
source_id: M23 (Optionally some more text here)
question_id: QH (Optionally some more text here)
Yes, my health is better overall …
question_id: QJ (Optionally some more text here)
The floods affected my income…
You can have more than one source inside each document.
 

Structured documents: source IDs nested within question IDs

You can even do it the other way round:
 
question_id: QH (Optionally some more text here)
source_id: M23 (Optionally some more text here)
Yes, my health is better overall …
source_id: M44 (Optionally some more text here)
No, my health is worse overall …
question_id: QJ (Optionally some more text here)
source_id: M44 (Optionally some more text here)
The floods affected my income…
The proviso is that the default listing of the statements would be questions nested within sources, which is what we otherwise also always have, not the other way round as in the transcript.
You can still use the statements panel to search for individual questions and then the corresponding sections will appear as in the transcript.
 

PDF files

Simple PDF files import just fine. However badly structured PDFs can be a problem. Sometimes documents with multiple columns will be read incorrectly. The app is unlikely to import tables in a useful way. Pictures are not imported.
 

Forced breaks

Each paragraph (or in the case of PDFs, one page) will be one statement . At the moment there is no way to force a statement to split into smaller pieces or to join up smaller statements. Let us know if this is a problem.