How to extract images from uploaded word document in Shiny
I am working on a Shiny app that reads Word documents uploaded by users. The uploaded document then displays a table of all elements in the document and their formatting. I want it to also show any pictures from the uploaded Word doc. Documents containing multiple images aren't an issue - users will only ever upload documents with one image.
To do this, I am using the officer
package. It has a function called media_extract
where you can do exactly what I want. The issue is, while the documentation says this function can be used to extract images from .doc or .ppt files, I can only get it to work for the latter. This is because media_extract
takes the image file path as an argument, but I cannot generate a file path for Word docs. The file path is generated by using one of two officer
functions depending on the file type: docx_summary
or pptx_summary
. These are also the functions I use to generate the tables rendered in my app. The pptx_summary
creates a table with a media_path
column, which displays a file path for image elements, while docx_summary
generates no such column. Absent that column and the path it includes, I don't know how to extract images from Word docs using this function.
For your convenience, here is my code for two Shiny apps: one that reads powerpoints and one for word docs. If you upload a powerpoint file and word file that include an image you will see how the tables generated in each app are different. My powerpoint app also renders an image, to show you how that is done. Obviously that functionality is not in my word app...
Powerpoint reader app:
library(officer)
library(DT)
library(shiny)
ui<- fluidPage(
titlePanel("Document Scanner"),
sidebarLayout(
sidebarPanel(
fileInput("uploadedfile", "Upload a file", multiple=FALSE,
accept=c(".ppt", ".pptx", ".docx"))
),
mainPanel(
tags$h3(tags$b("Document Summary")),
br(),
DT::dataTableOutput("display_table"),
br(),
imageOutput("myImage")
)
)
)
server<-function(input,output) {
#creating reactive value for uploaded file
x<-reactive({
uploadedfileDF<- input$uploadedfile
uploadedfileDataPath<- uploadedfileDF$datapath
read_pptx(uploadedfileDataPath)
})
#rendering formatting table
output$display_table<-DT::renderDataTable({
req(input$uploadedfile)
DT::datatable(pptx_summary(x()))
})
#rendering images from powerpoint
output$myImage<-renderImage({
readFile<-x()
fileSummaryDF<-pptx_summary(readFile)
#Getting path to image (this is basically straight from the documentation
#for media_extract)
fileSummaryDF_filtered<- fileSummaryDF[fileSummaryDF$content_type %in% "image", ]
media_file <- fileSummaryDF_filtered$media_file
png_file <- tempfile(fileext = ".png")
media_extract(readFile, path = media_file, target = png_file)
list(src = png_file,
alt="Test Picture")
})
}
shinyApp(ui, server)
Word reader app:
library(officer)
library(DT)
library(shiny)
ui<- fluidPage(
titlePanel("Word Doc Scanner"),
sidebarLayout(
sidebarPanel(
fileInput("uploadedfile", "Upload a file", multiple=FALSE,
accept=c(".doc", ".docx"))
),
mainPanel(
tags$h3(tags$b("Document Summary")),
br(),
DT::dataTableOutput("display_table"),
imageOutput("image1")
)
)
)
server<-function(input,output) {
# creating reactive content from uploaded file
x<-reactive({
print(input$uploadedfile)
uploadedfileDF<- input$uploadedfile
uploadedfileDataPath<- uploadedfileDF$datapath
docDF<-read_docx(path=uploadedfileDataPath)
summaryDF<-docx_summary(docDF)
})
#rendering formatting table
output$display_table<-DT::renderDataTable({
req(input$uploadedfile)
DT::datatable(x())
})
#how to render image without a image path anywhere in table?
}
shinyApp(ui, server)
If this can't be done in officer
then I'm happy to do it a different way. Thank you.
r shiny officer
add a comment |
I am working on a Shiny app that reads Word documents uploaded by users. The uploaded document then displays a table of all elements in the document and their formatting. I want it to also show any pictures from the uploaded Word doc. Documents containing multiple images aren't an issue - users will only ever upload documents with one image.
To do this, I am using the officer
package. It has a function called media_extract
where you can do exactly what I want. The issue is, while the documentation says this function can be used to extract images from .doc or .ppt files, I can only get it to work for the latter. This is because media_extract
takes the image file path as an argument, but I cannot generate a file path for Word docs. The file path is generated by using one of two officer
functions depending on the file type: docx_summary
or pptx_summary
. These are also the functions I use to generate the tables rendered in my app. The pptx_summary
creates a table with a media_path
column, which displays a file path for image elements, while docx_summary
generates no such column. Absent that column and the path it includes, I don't know how to extract images from Word docs using this function.
For your convenience, here is my code for two Shiny apps: one that reads powerpoints and one for word docs. If you upload a powerpoint file and word file that include an image you will see how the tables generated in each app are different. My powerpoint app also renders an image, to show you how that is done. Obviously that functionality is not in my word app...
Powerpoint reader app:
library(officer)
library(DT)
library(shiny)
ui<- fluidPage(
titlePanel("Document Scanner"),
sidebarLayout(
sidebarPanel(
fileInput("uploadedfile", "Upload a file", multiple=FALSE,
accept=c(".ppt", ".pptx", ".docx"))
),
mainPanel(
tags$h3(tags$b("Document Summary")),
br(),
DT::dataTableOutput("display_table"),
br(),
imageOutput("myImage")
)
)
)
server<-function(input,output) {
#creating reactive value for uploaded file
x<-reactive({
uploadedfileDF<- input$uploadedfile
uploadedfileDataPath<- uploadedfileDF$datapath
read_pptx(uploadedfileDataPath)
})
#rendering formatting table
output$display_table<-DT::renderDataTable({
req(input$uploadedfile)
DT::datatable(pptx_summary(x()))
})
#rendering images from powerpoint
output$myImage<-renderImage({
readFile<-x()
fileSummaryDF<-pptx_summary(readFile)
#Getting path to image (this is basically straight from the documentation
#for media_extract)
fileSummaryDF_filtered<- fileSummaryDF[fileSummaryDF$content_type %in% "image", ]
media_file <- fileSummaryDF_filtered$media_file
png_file <- tempfile(fileext = ".png")
media_extract(readFile, path = media_file, target = png_file)
list(src = png_file,
alt="Test Picture")
})
}
shinyApp(ui, server)
Word reader app:
library(officer)
library(DT)
library(shiny)
ui<- fluidPage(
titlePanel("Word Doc Scanner"),
sidebarLayout(
sidebarPanel(
fileInput("uploadedfile", "Upload a file", multiple=FALSE,
accept=c(".doc", ".docx"))
),
mainPanel(
tags$h3(tags$b("Document Summary")),
br(),
DT::dataTableOutput("display_table"),
imageOutput("image1")
)
)
)
server<-function(input,output) {
# creating reactive content from uploaded file
x<-reactive({
print(input$uploadedfile)
uploadedfileDF<- input$uploadedfile
uploadedfileDataPath<- uploadedfileDF$datapath
docDF<-read_docx(path=uploadedfileDataPath)
summaryDF<-docx_summary(docDF)
})
#rendering formatting table
output$display_table<-DT::renderDataTable({
req(input$uploadedfile)
DT::datatable(x())
})
#how to render image without a image path anywhere in table?
}
shinyApp(ui, server)
If this can't be done in officer
then I'm happy to do it a different way. Thank you.
r shiny officer
It's really just a ZIP file. Rename it to a tempfile with a.zip
and useunzip()
to unzip it. Look for aword/media/
subdir and the images are there. The source for thedocxtractr
package has code for the zip part.
– hrbrmstr
Nov 14 '18 at 2:09
I was indeed able to find the image by going using theword/media/
subdir and thus create the path to it. That was simple enough. I did not need to do any unzipping as you mentioned. Anyway, problem solved. Thank you.
– IanG
Nov 14 '18 at 21:16
a word doc is a zipped archive so something had to do the unzipping.
– hrbrmstr
Nov 14 '18 at 21:17
Then I assume theread_docx
function from theofficer
package must have unzipped it.
– IanG
Nov 15 '18 at 5:00
add a comment |
I am working on a Shiny app that reads Word documents uploaded by users. The uploaded document then displays a table of all elements in the document and their formatting. I want it to also show any pictures from the uploaded Word doc. Documents containing multiple images aren't an issue - users will only ever upload documents with one image.
To do this, I am using the officer
package. It has a function called media_extract
where you can do exactly what I want. The issue is, while the documentation says this function can be used to extract images from .doc or .ppt files, I can only get it to work for the latter. This is because media_extract
takes the image file path as an argument, but I cannot generate a file path for Word docs. The file path is generated by using one of two officer
functions depending on the file type: docx_summary
or pptx_summary
. These are also the functions I use to generate the tables rendered in my app. The pptx_summary
creates a table with a media_path
column, which displays a file path for image elements, while docx_summary
generates no such column. Absent that column and the path it includes, I don't know how to extract images from Word docs using this function.
For your convenience, here is my code for two Shiny apps: one that reads powerpoints and one for word docs. If you upload a powerpoint file and word file that include an image you will see how the tables generated in each app are different. My powerpoint app also renders an image, to show you how that is done. Obviously that functionality is not in my word app...
Powerpoint reader app:
library(officer)
library(DT)
library(shiny)
ui<- fluidPage(
titlePanel("Document Scanner"),
sidebarLayout(
sidebarPanel(
fileInput("uploadedfile", "Upload a file", multiple=FALSE,
accept=c(".ppt", ".pptx", ".docx"))
),
mainPanel(
tags$h3(tags$b("Document Summary")),
br(),
DT::dataTableOutput("display_table"),
br(),
imageOutput("myImage")
)
)
)
server<-function(input,output) {
#creating reactive value for uploaded file
x<-reactive({
uploadedfileDF<- input$uploadedfile
uploadedfileDataPath<- uploadedfileDF$datapath
read_pptx(uploadedfileDataPath)
})
#rendering formatting table
output$display_table<-DT::renderDataTable({
req(input$uploadedfile)
DT::datatable(pptx_summary(x()))
})
#rendering images from powerpoint
output$myImage<-renderImage({
readFile<-x()
fileSummaryDF<-pptx_summary(readFile)
#Getting path to image (this is basically straight from the documentation
#for media_extract)
fileSummaryDF_filtered<- fileSummaryDF[fileSummaryDF$content_type %in% "image", ]
media_file <- fileSummaryDF_filtered$media_file
png_file <- tempfile(fileext = ".png")
media_extract(readFile, path = media_file, target = png_file)
list(src = png_file,
alt="Test Picture")
})
}
shinyApp(ui, server)
Word reader app:
library(officer)
library(DT)
library(shiny)
ui<- fluidPage(
titlePanel("Word Doc Scanner"),
sidebarLayout(
sidebarPanel(
fileInput("uploadedfile", "Upload a file", multiple=FALSE,
accept=c(".doc", ".docx"))
),
mainPanel(
tags$h3(tags$b("Document Summary")),
br(),
DT::dataTableOutput("display_table"),
imageOutput("image1")
)
)
)
server<-function(input,output) {
# creating reactive content from uploaded file
x<-reactive({
print(input$uploadedfile)
uploadedfileDF<- input$uploadedfile
uploadedfileDataPath<- uploadedfileDF$datapath
docDF<-read_docx(path=uploadedfileDataPath)
summaryDF<-docx_summary(docDF)
})
#rendering formatting table
output$display_table<-DT::renderDataTable({
req(input$uploadedfile)
DT::datatable(x())
})
#how to render image without a image path anywhere in table?
}
shinyApp(ui, server)
If this can't be done in officer
then I'm happy to do it a different way. Thank you.
r shiny officer
I am working on a Shiny app that reads Word documents uploaded by users. The uploaded document then displays a table of all elements in the document and their formatting. I want it to also show any pictures from the uploaded Word doc. Documents containing multiple images aren't an issue - users will only ever upload documents with one image.
To do this, I am using the officer
package. It has a function called media_extract
where you can do exactly what I want. The issue is, while the documentation says this function can be used to extract images from .doc or .ppt files, I can only get it to work for the latter. This is because media_extract
takes the image file path as an argument, but I cannot generate a file path for Word docs. The file path is generated by using one of two officer
functions depending on the file type: docx_summary
or pptx_summary
. These are also the functions I use to generate the tables rendered in my app. The pptx_summary
creates a table with a media_path
column, which displays a file path for image elements, while docx_summary
generates no such column. Absent that column and the path it includes, I don't know how to extract images from Word docs using this function.
For your convenience, here is my code for two Shiny apps: one that reads powerpoints and one for word docs. If you upload a powerpoint file and word file that include an image you will see how the tables generated in each app are different. My powerpoint app also renders an image, to show you how that is done. Obviously that functionality is not in my word app...
Powerpoint reader app:
library(officer)
library(DT)
library(shiny)
ui<- fluidPage(
titlePanel("Document Scanner"),
sidebarLayout(
sidebarPanel(
fileInput("uploadedfile", "Upload a file", multiple=FALSE,
accept=c(".ppt", ".pptx", ".docx"))
),
mainPanel(
tags$h3(tags$b("Document Summary")),
br(),
DT::dataTableOutput("display_table"),
br(),
imageOutput("myImage")
)
)
)
server<-function(input,output) {
#creating reactive value for uploaded file
x<-reactive({
uploadedfileDF<- input$uploadedfile
uploadedfileDataPath<- uploadedfileDF$datapath
read_pptx(uploadedfileDataPath)
})
#rendering formatting table
output$display_table<-DT::renderDataTable({
req(input$uploadedfile)
DT::datatable(pptx_summary(x()))
})
#rendering images from powerpoint
output$myImage<-renderImage({
readFile<-x()
fileSummaryDF<-pptx_summary(readFile)
#Getting path to image (this is basically straight from the documentation
#for media_extract)
fileSummaryDF_filtered<- fileSummaryDF[fileSummaryDF$content_type %in% "image", ]
media_file <- fileSummaryDF_filtered$media_file
png_file <- tempfile(fileext = ".png")
media_extract(readFile, path = media_file, target = png_file)
list(src = png_file,
alt="Test Picture")
})
}
shinyApp(ui, server)
Word reader app:
library(officer)
library(DT)
library(shiny)
ui<- fluidPage(
titlePanel("Word Doc Scanner"),
sidebarLayout(
sidebarPanel(
fileInput("uploadedfile", "Upload a file", multiple=FALSE,
accept=c(".doc", ".docx"))
),
mainPanel(
tags$h3(tags$b("Document Summary")),
br(),
DT::dataTableOutput("display_table"),
imageOutput("image1")
)
)
)
server<-function(input,output) {
# creating reactive content from uploaded file
x<-reactive({
print(input$uploadedfile)
uploadedfileDF<- input$uploadedfile
uploadedfileDataPath<- uploadedfileDF$datapath
docDF<-read_docx(path=uploadedfileDataPath)
summaryDF<-docx_summary(docDF)
})
#rendering formatting table
output$display_table<-DT::renderDataTable({
req(input$uploadedfile)
DT::datatable(x())
})
#how to render image without a image path anywhere in table?
}
shinyApp(ui, server)
If this can't be done in officer
then I'm happy to do it a different way. Thank you.
r shiny officer
r shiny officer
asked Nov 13 '18 at 21:16
IanGIanG
11
11
It's really just a ZIP file. Rename it to a tempfile with a.zip
and useunzip()
to unzip it. Look for aword/media/
subdir and the images are there. The source for thedocxtractr
package has code for the zip part.
– hrbrmstr
Nov 14 '18 at 2:09
I was indeed able to find the image by going using theword/media/
subdir and thus create the path to it. That was simple enough. I did not need to do any unzipping as you mentioned. Anyway, problem solved. Thank you.
– IanG
Nov 14 '18 at 21:16
a word doc is a zipped archive so something had to do the unzipping.
– hrbrmstr
Nov 14 '18 at 21:17
Then I assume theread_docx
function from theofficer
package must have unzipped it.
– IanG
Nov 15 '18 at 5:00
add a comment |
It's really just a ZIP file. Rename it to a tempfile with a.zip
and useunzip()
to unzip it. Look for aword/media/
subdir and the images are there. The source for thedocxtractr
package has code for the zip part.
– hrbrmstr
Nov 14 '18 at 2:09
I was indeed able to find the image by going using theword/media/
subdir and thus create the path to it. That was simple enough. I did not need to do any unzipping as you mentioned. Anyway, problem solved. Thank you.
– IanG
Nov 14 '18 at 21:16
a word doc is a zipped archive so something had to do the unzipping.
– hrbrmstr
Nov 14 '18 at 21:17
Then I assume theread_docx
function from theofficer
package must have unzipped it.
– IanG
Nov 15 '18 at 5:00
It's really just a ZIP file. Rename it to a tempfile with a
.zip
and use unzip()
to unzip it. Look for a word/media/
subdir and the images are there. The source for the docxtractr
package has code for the zip part.– hrbrmstr
Nov 14 '18 at 2:09
It's really just a ZIP file. Rename it to a tempfile with a
.zip
and use unzip()
to unzip it. Look for a word/media/
subdir and the images are there. The source for the docxtractr
package has code for the zip part.– hrbrmstr
Nov 14 '18 at 2:09
I was indeed able to find the image by going using the
word/media/
subdir and thus create the path to it. That was simple enough. I did not need to do any unzipping as you mentioned. Anyway, problem solved. Thank you.– IanG
Nov 14 '18 at 21:16
I was indeed able to find the image by going using the
word/media/
subdir and thus create the path to it. That was simple enough. I did not need to do any unzipping as you mentioned. Anyway, problem solved. Thank you.– IanG
Nov 14 '18 at 21:16
a word doc is a zipped archive so something had to do the unzipping.
– hrbrmstr
Nov 14 '18 at 21:17
a word doc is a zipped archive so something had to do the unzipping.
– hrbrmstr
Nov 14 '18 at 21:17
Then I assume the
read_docx
function from the officer
package must have unzipped it.– IanG
Nov 15 '18 at 5:00
Then I assume the
read_docx
function from the officer
package must have unzipped it.– IanG
Nov 15 '18 at 5:00
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53289598%2fhow-to-extract-images-from-uploaded-word-document-in-shiny%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53289598%2fhow-to-extract-images-from-uploaded-word-document-in-shiny%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
It's really just a ZIP file. Rename it to a tempfile with a
.zip
and useunzip()
to unzip it. Look for aword/media/
subdir and the images are there. The source for thedocxtractr
package has code for the zip part.– hrbrmstr
Nov 14 '18 at 2:09
I was indeed able to find the image by going using the
word/media/
subdir and thus create the path to it. That was simple enough. I did not need to do any unzipping as you mentioned. Anyway, problem solved. Thank you.– IanG
Nov 14 '18 at 21:16
a word doc is a zipped archive so something had to do the unzipping.
– hrbrmstr
Nov 14 '18 at 21:17
Then I assume the
read_docx
function from theofficer
package must have unzipped it.– IanG
Nov 15 '18 at 5:00