Oracle Text will not work with NVARCHAR2. What else might be unavailable?












22















We are going to migrate an application to have it support Unicode and have to choose between unicode character set for the whole database, or unicode columns stored in N[VAR]CHAR2.



We know that we will no more have the possibility of indexing column contents with Oracle Text if we choose NVARCHAR2, because Oracle Text can only index columns based on the CHAR type.



Apart that, is it likely that other major differences arise when harvesting from Oracle possibilities?



Also, is it likely that some new features are added in newer versions of Oracle, but only supporting either CHAR columns or NCHAR columns but not both?



Thank you for your answers.



Note following Justin's answer:



Thank you for your answer. I will discuss your points, applied to our case:



Our application is usually alone on the Oracle database and takes care of the
data itself. Other software that connect to the database are limited to Toad,
Tora or SQL developer.



We also use SQL*Loader and SQL*Plus to communicate with the database for basic
statements or to upgrade between versions of the product. We have
not heard of any specific problem with all those software regarding NVARCHAR2.



We are also not aware that database administrators among our customers would
like to use other tools on the database that could not support data on
NVARCHAR2 and we are not really concerned whether their tools might disrupt,
after all they are skilled in their job and may find other tools if necessary.



Your last two points are more insightful for our case. We do not use many
built-in packages from Oracle but it still happens. We will explore that
problem.



Could we also expect performance breakage if our application (that is compiled under Visual C++), that uses wchar_t to
store UTF-16, has to perform encoding conversions on all processed data?










share|improve this question





























    22















    We are going to migrate an application to have it support Unicode and have to choose between unicode character set for the whole database, or unicode columns stored in N[VAR]CHAR2.



    We know that we will no more have the possibility of indexing column contents with Oracle Text if we choose NVARCHAR2, because Oracle Text can only index columns based on the CHAR type.



    Apart that, is it likely that other major differences arise when harvesting from Oracle possibilities?



    Also, is it likely that some new features are added in newer versions of Oracle, but only supporting either CHAR columns or NCHAR columns but not both?



    Thank you for your answers.



    Note following Justin's answer:



    Thank you for your answer. I will discuss your points, applied to our case:



    Our application is usually alone on the Oracle database and takes care of the
    data itself. Other software that connect to the database are limited to Toad,
    Tora or SQL developer.



    We also use SQL*Loader and SQL*Plus to communicate with the database for basic
    statements or to upgrade between versions of the product. We have
    not heard of any specific problem with all those software regarding NVARCHAR2.



    We are also not aware that database administrators among our customers would
    like to use other tools on the database that could not support data on
    NVARCHAR2 and we are not really concerned whether their tools might disrupt,
    after all they are skilled in their job and may find other tools if necessary.



    Your last two points are more insightful for our case. We do not use many
    built-in packages from Oracle but it still happens. We will explore that
    problem.



    Could we also expect performance breakage if our application (that is compiled under Visual C++), that uses wchar_t to
    store UTF-16, has to perform encoding conversions on all processed data?










    share|improve this question



























      22












      22








      22


      4






      We are going to migrate an application to have it support Unicode and have to choose between unicode character set for the whole database, or unicode columns stored in N[VAR]CHAR2.



      We know that we will no more have the possibility of indexing column contents with Oracle Text if we choose NVARCHAR2, because Oracle Text can only index columns based on the CHAR type.



      Apart that, is it likely that other major differences arise when harvesting from Oracle possibilities?



      Also, is it likely that some new features are added in newer versions of Oracle, but only supporting either CHAR columns or NCHAR columns but not both?



      Thank you for your answers.



      Note following Justin's answer:



      Thank you for your answer. I will discuss your points, applied to our case:



      Our application is usually alone on the Oracle database and takes care of the
      data itself. Other software that connect to the database are limited to Toad,
      Tora or SQL developer.



      We also use SQL*Loader and SQL*Plus to communicate with the database for basic
      statements or to upgrade between versions of the product. We have
      not heard of any specific problem with all those software regarding NVARCHAR2.



      We are also not aware that database administrators among our customers would
      like to use other tools on the database that could not support data on
      NVARCHAR2 and we are not really concerned whether their tools might disrupt,
      after all they are skilled in their job and may find other tools if necessary.



      Your last two points are more insightful for our case. We do not use many
      built-in packages from Oracle but it still happens. We will explore that
      problem.



      Could we also expect performance breakage if our application (that is compiled under Visual C++), that uses wchar_t to
      store UTF-16, has to perform encoding conversions on all processed data?










      share|improve this question
















      We are going to migrate an application to have it support Unicode and have to choose between unicode character set for the whole database, or unicode columns stored in N[VAR]CHAR2.



      We know that we will no more have the possibility of indexing column contents with Oracle Text if we choose NVARCHAR2, because Oracle Text can only index columns based on the CHAR type.



      Apart that, is it likely that other major differences arise when harvesting from Oracle possibilities?



      Also, is it likely that some new features are added in newer versions of Oracle, but only supporting either CHAR columns or NCHAR columns but not both?



      Thank you for your answers.



      Note following Justin's answer:



      Thank you for your answer. I will discuss your points, applied to our case:



      Our application is usually alone on the Oracle database and takes care of the
      data itself. Other software that connect to the database are limited to Toad,
      Tora or SQL developer.



      We also use SQL*Loader and SQL*Plus to communicate with the database for basic
      statements or to upgrade between versions of the product. We have
      not heard of any specific problem with all those software regarding NVARCHAR2.



      We are also not aware that database administrators among our customers would
      like to use other tools on the database that could not support data on
      NVARCHAR2 and we are not really concerned whether their tools might disrupt,
      after all they are skilled in their job and may find other tools if necessary.



      Your last two points are more insightful for our case. We do not use many
      built-in packages from Oracle but it still happens. We will explore that
      problem.



      Could we also expect performance breakage if our application (that is compiled under Visual C++), that uses wchar_t to
      store UTF-16, has to perform encoding conversions on all processed data?







      oracle unicode character-encoding nvarchar






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 10 '10 at 9:15







      Benoit

















      asked Dec 9 '10 at 17:07









      BenoitBenoit

      58.9k15166212




      58.9k15166212
























          1 Answer
          1






          active

          oldest

          votes


















          29














          If you have anything close to a choice, use a Unicode character set for the entire database. Life in general is just blindingly easier that way.




          • There are plenty of third party utilities and libraries that simply don't support NCHAR/ NVARCHAR2 columns or that don't make working with NCHAR/ NVARCHAR2 columns pleasant. It's extremely annoying, for example, when your shiny new reporting tool can't report on your NVARCHAR2 data.

          • For custom applications, working with NCHAR/ NVARCHAR2 columns requires jumping through some hoops that working with CHAR/ VARCHAR2 Unicode encoded columns does not. In JDBC code, for example, you'd constantly be calling the Statement.setFormOfUse method. Other languages and frameworks will have other gotchas; some will be relatively well documented and minor others will be relatively obscure.

          • Many built-in packages will only accept (or return) a VARCHAR2 rather than a NVARCHAR2. You'll still be able to call them because of implicit conversion but you may end up with character set conversion issues.

          • In general, being able to avoid character set conversion issues within the database and relegating those issues to the edge where the database is actually sending or receiving data from a client makes the job of developing an application much easier. It's enough work to debug character set conversion issues that result from network transmission-- figuring out that some data got corrupted when a stored procedure concatenated data from a VARCHAR2 and a NVARCHAR2 and stored the result in a VARCHAR2 before it was sent over the network can be excruciating.


          Oracle designed the NCHAR/ NVARCHAR2 data types for cases where you are trying to support legacy applications that don't support Unicode in the same database as new applications that are using Unicode and for cases where it is beneficial to store some Unicode data with a different encoding (i.e. you have a large amount of Japanese data that you would prefer to store using the UTF-16 encoding in a NVARCHAR2 rather than the UTF-8 encoding). If you are not in one of those two situations, and it doesn't sound like you are, I would avoid NCHAR/ NVARCHAR2 at all costs.



          Responding to your followups




          Our application is usually alone on
          the Oracle database and takes care of
          the data itself. Other software that
          connect to the database are limited to
          Toad, Tora or SQL developer.




          What do you mean "takes care of the data itself"? I'm hoping you're not saying that you've configured your application to bypass Oracle's character set conversion routines and that you do all the character set conversion yourself.



          I'm also assuming that you are using some sort of API/ library to access the database even if that is OCI. Have you looked into what changes you'll need to make to your application to support NCHAR/ NVARCHAR2 and whether the API you're using supports NCHAR/ NVARCHAR2? The fact that you're getting Unicode data in C++ doesn't actually indicate that you won't need to make (potentially significant) changes to support NCHAR/ NVARCHAR2 columns.




          We also use SQL*Loader and SQL*Plus to
          communicate with the database for
          basic statements or to upgrade between
          versions of the product. We have not
          heard of any specific problem with all
          those software regarding NVARCHAR2.




          Those applications all work with NCHAR/ NVARCHAR2. NCHAR/ NVARCHAR2 introduce some additional complexities into scripts particularly if you are trying to encode string constants that are not representable in the database character set. You can certainly work around the issues, though.




          We are also not aware that database
          administrators among our customers
          would like to use other tools on the
          database that could not support data
          on NVARCHAR2 and we are not really
          concerned whether their tools might
          disrupt, after all they are skilled in
          their job and may find other tools if
          necessary.




          While I'm sure that your customers can find alternate ways of working with your data, if your application doesn't play nicely with their enterprise reporting tool or their enterprise ETL tool or whatever desktop tools they happen to be experienced with, it's very likely that the customer will blame your application rather than their tools. It probably won't be a show stopper, but there is also no benefit to causing customers grief unnecessarily. That may not drive them to use a competitor's product, but it won't make them eager to embrace your product.




          Could we also expect performance
          breakage if our application (that is
          compiled under Visual C++), that uses
          wchar_t to store UTF-16, has to
          perform encoding conversions on all
          processed data?




          I'm not sure what "conversions" you're talking about. This may get back to my initial question about whether you're stating that you are bypassing Oracle's NLS layer to do character set conversion on your own.



          My bottom line, though, is that I don't see any advantages to using NCHAR/ NVARCHAR2 given what you're describing. There are plenty of potential downsides to using them. Even if you can eliminate 99% of the downsides as irrelevant to your particular needs, however, you're still facing a situation where at best it's a wash between the two approaches. Given that, I'd much rather go with the approach that maximizes flexibility going forward, and that's converting the entire database to Unicode (AL32UTF8 presumably) and just using that.






          share|improve this answer


























          • Thank you for your answer. I have added some additional information to the original question regarding it. Could you please have an eye? Thanks!

            – Benoit
            Dec 10 '10 at 8:28











          • @Benoit - Posted some updates.

            – Justin Cave
            Dec 10 '10 at 17:18











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f4401043%2foracle-text-will-not-work-with-nvarchar2-what-else-might-be-unavailable%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          29














          If you have anything close to a choice, use a Unicode character set for the entire database. Life in general is just blindingly easier that way.




          • There are plenty of third party utilities and libraries that simply don't support NCHAR/ NVARCHAR2 columns or that don't make working with NCHAR/ NVARCHAR2 columns pleasant. It's extremely annoying, for example, when your shiny new reporting tool can't report on your NVARCHAR2 data.

          • For custom applications, working with NCHAR/ NVARCHAR2 columns requires jumping through some hoops that working with CHAR/ VARCHAR2 Unicode encoded columns does not. In JDBC code, for example, you'd constantly be calling the Statement.setFormOfUse method. Other languages and frameworks will have other gotchas; some will be relatively well documented and minor others will be relatively obscure.

          • Many built-in packages will only accept (or return) a VARCHAR2 rather than a NVARCHAR2. You'll still be able to call them because of implicit conversion but you may end up with character set conversion issues.

          • In general, being able to avoid character set conversion issues within the database and relegating those issues to the edge where the database is actually sending or receiving data from a client makes the job of developing an application much easier. It's enough work to debug character set conversion issues that result from network transmission-- figuring out that some data got corrupted when a stored procedure concatenated data from a VARCHAR2 and a NVARCHAR2 and stored the result in a VARCHAR2 before it was sent over the network can be excruciating.


          Oracle designed the NCHAR/ NVARCHAR2 data types for cases where you are trying to support legacy applications that don't support Unicode in the same database as new applications that are using Unicode and for cases where it is beneficial to store some Unicode data with a different encoding (i.e. you have a large amount of Japanese data that you would prefer to store using the UTF-16 encoding in a NVARCHAR2 rather than the UTF-8 encoding). If you are not in one of those two situations, and it doesn't sound like you are, I would avoid NCHAR/ NVARCHAR2 at all costs.



          Responding to your followups




          Our application is usually alone on
          the Oracle database and takes care of
          the data itself. Other software that
          connect to the database are limited to
          Toad, Tora or SQL developer.




          What do you mean "takes care of the data itself"? I'm hoping you're not saying that you've configured your application to bypass Oracle's character set conversion routines and that you do all the character set conversion yourself.



          I'm also assuming that you are using some sort of API/ library to access the database even if that is OCI. Have you looked into what changes you'll need to make to your application to support NCHAR/ NVARCHAR2 and whether the API you're using supports NCHAR/ NVARCHAR2? The fact that you're getting Unicode data in C++ doesn't actually indicate that you won't need to make (potentially significant) changes to support NCHAR/ NVARCHAR2 columns.




          We also use SQL*Loader and SQL*Plus to
          communicate with the database for
          basic statements or to upgrade between
          versions of the product. We have not
          heard of any specific problem with all
          those software regarding NVARCHAR2.




          Those applications all work with NCHAR/ NVARCHAR2. NCHAR/ NVARCHAR2 introduce some additional complexities into scripts particularly if you are trying to encode string constants that are not representable in the database character set. You can certainly work around the issues, though.




          We are also not aware that database
          administrators among our customers
          would like to use other tools on the
          database that could not support data
          on NVARCHAR2 and we are not really
          concerned whether their tools might
          disrupt, after all they are skilled in
          their job and may find other tools if
          necessary.




          While I'm sure that your customers can find alternate ways of working with your data, if your application doesn't play nicely with their enterprise reporting tool or their enterprise ETL tool or whatever desktop tools they happen to be experienced with, it's very likely that the customer will blame your application rather than their tools. It probably won't be a show stopper, but there is also no benefit to causing customers grief unnecessarily. That may not drive them to use a competitor's product, but it won't make them eager to embrace your product.




          Could we also expect performance
          breakage if our application (that is
          compiled under Visual C++), that uses
          wchar_t to store UTF-16, has to
          perform encoding conversions on all
          processed data?




          I'm not sure what "conversions" you're talking about. This may get back to my initial question about whether you're stating that you are bypassing Oracle's NLS layer to do character set conversion on your own.



          My bottom line, though, is that I don't see any advantages to using NCHAR/ NVARCHAR2 given what you're describing. There are plenty of potential downsides to using them. Even if you can eliminate 99% of the downsides as irrelevant to your particular needs, however, you're still facing a situation where at best it's a wash between the two approaches. Given that, I'd much rather go with the approach that maximizes flexibility going forward, and that's converting the entire database to Unicode (AL32UTF8 presumably) and just using that.






          share|improve this answer


























          • Thank you for your answer. I have added some additional information to the original question regarding it. Could you please have an eye? Thanks!

            – Benoit
            Dec 10 '10 at 8:28











          • @Benoit - Posted some updates.

            – Justin Cave
            Dec 10 '10 at 17:18
















          29














          If you have anything close to a choice, use a Unicode character set for the entire database. Life in general is just blindingly easier that way.




          • There are plenty of third party utilities and libraries that simply don't support NCHAR/ NVARCHAR2 columns or that don't make working with NCHAR/ NVARCHAR2 columns pleasant. It's extremely annoying, for example, when your shiny new reporting tool can't report on your NVARCHAR2 data.

          • For custom applications, working with NCHAR/ NVARCHAR2 columns requires jumping through some hoops that working with CHAR/ VARCHAR2 Unicode encoded columns does not. In JDBC code, for example, you'd constantly be calling the Statement.setFormOfUse method. Other languages and frameworks will have other gotchas; some will be relatively well documented and minor others will be relatively obscure.

          • Many built-in packages will only accept (or return) a VARCHAR2 rather than a NVARCHAR2. You'll still be able to call them because of implicit conversion but you may end up with character set conversion issues.

          • In general, being able to avoid character set conversion issues within the database and relegating those issues to the edge where the database is actually sending or receiving data from a client makes the job of developing an application much easier. It's enough work to debug character set conversion issues that result from network transmission-- figuring out that some data got corrupted when a stored procedure concatenated data from a VARCHAR2 and a NVARCHAR2 and stored the result in a VARCHAR2 before it was sent over the network can be excruciating.


          Oracle designed the NCHAR/ NVARCHAR2 data types for cases where you are trying to support legacy applications that don't support Unicode in the same database as new applications that are using Unicode and for cases where it is beneficial to store some Unicode data with a different encoding (i.e. you have a large amount of Japanese data that you would prefer to store using the UTF-16 encoding in a NVARCHAR2 rather than the UTF-8 encoding). If you are not in one of those two situations, and it doesn't sound like you are, I would avoid NCHAR/ NVARCHAR2 at all costs.



          Responding to your followups




          Our application is usually alone on
          the Oracle database and takes care of
          the data itself. Other software that
          connect to the database are limited to
          Toad, Tora or SQL developer.




          What do you mean "takes care of the data itself"? I'm hoping you're not saying that you've configured your application to bypass Oracle's character set conversion routines and that you do all the character set conversion yourself.



          I'm also assuming that you are using some sort of API/ library to access the database even if that is OCI. Have you looked into what changes you'll need to make to your application to support NCHAR/ NVARCHAR2 and whether the API you're using supports NCHAR/ NVARCHAR2? The fact that you're getting Unicode data in C++ doesn't actually indicate that you won't need to make (potentially significant) changes to support NCHAR/ NVARCHAR2 columns.




          We also use SQL*Loader and SQL*Plus to
          communicate with the database for
          basic statements or to upgrade between
          versions of the product. We have not
          heard of any specific problem with all
          those software regarding NVARCHAR2.




          Those applications all work with NCHAR/ NVARCHAR2. NCHAR/ NVARCHAR2 introduce some additional complexities into scripts particularly if you are trying to encode string constants that are not representable in the database character set. You can certainly work around the issues, though.




          We are also not aware that database
          administrators among our customers
          would like to use other tools on the
          database that could not support data
          on NVARCHAR2 and we are not really
          concerned whether their tools might
          disrupt, after all they are skilled in
          their job and may find other tools if
          necessary.




          While I'm sure that your customers can find alternate ways of working with your data, if your application doesn't play nicely with their enterprise reporting tool or their enterprise ETL tool or whatever desktop tools they happen to be experienced with, it's very likely that the customer will blame your application rather than their tools. It probably won't be a show stopper, but there is also no benefit to causing customers grief unnecessarily. That may not drive them to use a competitor's product, but it won't make them eager to embrace your product.




          Could we also expect performance
          breakage if our application (that is
          compiled under Visual C++), that uses
          wchar_t to store UTF-16, has to
          perform encoding conversions on all
          processed data?




          I'm not sure what "conversions" you're talking about. This may get back to my initial question about whether you're stating that you are bypassing Oracle's NLS layer to do character set conversion on your own.



          My bottom line, though, is that I don't see any advantages to using NCHAR/ NVARCHAR2 given what you're describing. There are plenty of potential downsides to using them. Even if you can eliminate 99% of the downsides as irrelevant to your particular needs, however, you're still facing a situation where at best it's a wash between the two approaches. Given that, I'd much rather go with the approach that maximizes flexibility going forward, and that's converting the entire database to Unicode (AL32UTF8 presumably) and just using that.






          share|improve this answer


























          • Thank you for your answer. I have added some additional information to the original question regarding it. Could you please have an eye? Thanks!

            – Benoit
            Dec 10 '10 at 8:28











          • @Benoit - Posted some updates.

            – Justin Cave
            Dec 10 '10 at 17:18














          29












          29








          29







          If you have anything close to a choice, use a Unicode character set for the entire database. Life in general is just blindingly easier that way.




          • There are plenty of third party utilities and libraries that simply don't support NCHAR/ NVARCHAR2 columns or that don't make working with NCHAR/ NVARCHAR2 columns pleasant. It's extremely annoying, for example, when your shiny new reporting tool can't report on your NVARCHAR2 data.

          • For custom applications, working with NCHAR/ NVARCHAR2 columns requires jumping through some hoops that working with CHAR/ VARCHAR2 Unicode encoded columns does not. In JDBC code, for example, you'd constantly be calling the Statement.setFormOfUse method. Other languages and frameworks will have other gotchas; some will be relatively well documented and minor others will be relatively obscure.

          • Many built-in packages will only accept (or return) a VARCHAR2 rather than a NVARCHAR2. You'll still be able to call them because of implicit conversion but you may end up with character set conversion issues.

          • In general, being able to avoid character set conversion issues within the database and relegating those issues to the edge where the database is actually sending or receiving data from a client makes the job of developing an application much easier. It's enough work to debug character set conversion issues that result from network transmission-- figuring out that some data got corrupted when a stored procedure concatenated data from a VARCHAR2 and a NVARCHAR2 and stored the result in a VARCHAR2 before it was sent over the network can be excruciating.


          Oracle designed the NCHAR/ NVARCHAR2 data types for cases where you are trying to support legacy applications that don't support Unicode in the same database as new applications that are using Unicode and for cases where it is beneficial to store some Unicode data with a different encoding (i.e. you have a large amount of Japanese data that you would prefer to store using the UTF-16 encoding in a NVARCHAR2 rather than the UTF-8 encoding). If you are not in one of those two situations, and it doesn't sound like you are, I would avoid NCHAR/ NVARCHAR2 at all costs.



          Responding to your followups




          Our application is usually alone on
          the Oracle database and takes care of
          the data itself. Other software that
          connect to the database are limited to
          Toad, Tora or SQL developer.




          What do you mean "takes care of the data itself"? I'm hoping you're not saying that you've configured your application to bypass Oracle's character set conversion routines and that you do all the character set conversion yourself.



          I'm also assuming that you are using some sort of API/ library to access the database even if that is OCI. Have you looked into what changes you'll need to make to your application to support NCHAR/ NVARCHAR2 and whether the API you're using supports NCHAR/ NVARCHAR2? The fact that you're getting Unicode data in C++ doesn't actually indicate that you won't need to make (potentially significant) changes to support NCHAR/ NVARCHAR2 columns.




          We also use SQL*Loader and SQL*Plus to
          communicate with the database for
          basic statements or to upgrade between
          versions of the product. We have not
          heard of any specific problem with all
          those software regarding NVARCHAR2.




          Those applications all work with NCHAR/ NVARCHAR2. NCHAR/ NVARCHAR2 introduce some additional complexities into scripts particularly if you are trying to encode string constants that are not representable in the database character set. You can certainly work around the issues, though.




          We are also not aware that database
          administrators among our customers
          would like to use other tools on the
          database that could not support data
          on NVARCHAR2 and we are not really
          concerned whether their tools might
          disrupt, after all they are skilled in
          their job and may find other tools if
          necessary.




          While I'm sure that your customers can find alternate ways of working with your data, if your application doesn't play nicely with their enterprise reporting tool or their enterprise ETL tool or whatever desktop tools they happen to be experienced with, it's very likely that the customer will blame your application rather than their tools. It probably won't be a show stopper, but there is also no benefit to causing customers grief unnecessarily. That may not drive them to use a competitor's product, but it won't make them eager to embrace your product.




          Could we also expect performance
          breakage if our application (that is
          compiled under Visual C++), that uses
          wchar_t to store UTF-16, has to
          perform encoding conversions on all
          processed data?




          I'm not sure what "conversions" you're talking about. This may get back to my initial question about whether you're stating that you are bypassing Oracle's NLS layer to do character set conversion on your own.



          My bottom line, though, is that I don't see any advantages to using NCHAR/ NVARCHAR2 given what you're describing. There are plenty of potential downsides to using them. Even if you can eliminate 99% of the downsides as irrelevant to your particular needs, however, you're still facing a situation where at best it's a wash between the two approaches. Given that, I'd much rather go with the approach that maximizes flexibility going forward, and that's converting the entire database to Unicode (AL32UTF8 presumably) and just using that.






          share|improve this answer















          If you have anything close to a choice, use a Unicode character set for the entire database. Life in general is just blindingly easier that way.




          • There are plenty of third party utilities and libraries that simply don't support NCHAR/ NVARCHAR2 columns or that don't make working with NCHAR/ NVARCHAR2 columns pleasant. It's extremely annoying, for example, when your shiny new reporting tool can't report on your NVARCHAR2 data.

          • For custom applications, working with NCHAR/ NVARCHAR2 columns requires jumping through some hoops that working with CHAR/ VARCHAR2 Unicode encoded columns does not. In JDBC code, for example, you'd constantly be calling the Statement.setFormOfUse method. Other languages and frameworks will have other gotchas; some will be relatively well documented and minor others will be relatively obscure.

          • Many built-in packages will only accept (or return) a VARCHAR2 rather than a NVARCHAR2. You'll still be able to call them because of implicit conversion but you may end up with character set conversion issues.

          • In general, being able to avoid character set conversion issues within the database and relegating those issues to the edge where the database is actually sending or receiving data from a client makes the job of developing an application much easier. It's enough work to debug character set conversion issues that result from network transmission-- figuring out that some data got corrupted when a stored procedure concatenated data from a VARCHAR2 and a NVARCHAR2 and stored the result in a VARCHAR2 before it was sent over the network can be excruciating.


          Oracle designed the NCHAR/ NVARCHAR2 data types for cases where you are trying to support legacy applications that don't support Unicode in the same database as new applications that are using Unicode and for cases where it is beneficial to store some Unicode data with a different encoding (i.e. you have a large amount of Japanese data that you would prefer to store using the UTF-16 encoding in a NVARCHAR2 rather than the UTF-8 encoding). If you are not in one of those two situations, and it doesn't sound like you are, I would avoid NCHAR/ NVARCHAR2 at all costs.



          Responding to your followups




          Our application is usually alone on
          the Oracle database and takes care of
          the data itself. Other software that
          connect to the database are limited to
          Toad, Tora or SQL developer.




          What do you mean "takes care of the data itself"? I'm hoping you're not saying that you've configured your application to bypass Oracle's character set conversion routines and that you do all the character set conversion yourself.



          I'm also assuming that you are using some sort of API/ library to access the database even if that is OCI. Have you looked into what changes you'll need to make to your application to support NCHAR/ NVARCHAR2 and whether the API you're using supports NCHAR/ NVARCHAR2? The fact that you're getting Unicode data in C++ doesn't actually indicate that you won't need to make (potentially significant) changes to support NCHAR/ NVARCHAR2 columns.




          We also use SQL*Loader and SQL*Plus to
          communicate with the database for
          basic statements or to upgrade between
          versions of the product. We have not
          heard of any specific problem with all
          those software regarding NVARCHAR2.




          Those applications all work with NCHAR/ NVARCHAR2. NCHAR/ NVARCHAR2 introduce some additional complexities into scripts particularly if you are trying to encode string constants that are not representable in the database character set. You can certainly work around the issues, though.




          We are also not aware that database
          administrators among our customers
          would like to use other tools on the
          database that could not support data
          on NVARCHAR2 and we are not really
          concerned whether their tools might
          disrupt, after all they are skilled in
          their job and may find other tools if
          necessary.




          While I'm sure that your customers can find alternate ways of working with your data, if your application doesn't play nicely with their enterprise reporting tool or their enterprise ETL tool or whatever desktop tools they happen to be experienced with, it's very likely that the customer will blame your application rather than their tools. It probably won't be a show stopper, but there is also no benefit to causing customers grief unnecessarily. That may not drive them to use a competitor's product, but it won't make them eager to embrace your product.




          Could we also expect performance
          breakage if our application (that is
          compiled under Visual C++), that uses
          wchar_t to store UTF-16, has to
          perform encoding conversions on all
          processed data?




          I'm not sure what "conversions" you're talking about. This may get back to my initial question about whether you're stating that you are bypassing Oracle's NLS layer to do character set conversion on your own.



          My bottom line, though, is that I don't see any advantages to using NCHAR/ NVARCHAR2 given what you're describing. There are plenty of potential downsides to using them. Even if you can eliminate 99% of the downsides as irrelevant to your particular needs, however, you're still facing a situation where at best it's a wash between the two approaches. Given that, I'd much rather go with the approach that maximizes flexibility going forward, and that's converting the entire database to Unicode (AL32UTF8 presumably) and just using that.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Dec 10 '10 at 17:18

























          answered Dec 9 '10 at 18:34









          Justin CaveJustin Cave

          187k18283318




          187k18283318













          • Thank you for your answer. I have added some additional information to the original question regarding it. Could you please have an eye? Thanks!

            – Benoit
            Dec 10 '10 at 8:28











          • @Benoit - Posted some updates.

            – Justin Cave
            Dec 10 '10 at 17:18



















          • Thank you for your answer. I have added some additional information to the original question regarding it. Could you please have an eye? Thanks!

            – Benoit
            Dec 10 '10 at 8:28











          • @Benoit - Posted some updates.

            – Justin Cave
            Dec 10 '10 at 17:18

















          Thank you for your answer. I have added some additional information to the original question regarding it. Could you please have an eye? Thanks!

          – Benoit
          Dec 10 '10 at 8:28





          Thank you for your answer. I have added some additional information to the original question regarding it. Could you please have an eye? Thanks!

          – Benoit
          Dec 10 '10 at 8:28













          @Benoit - Posted some updates.

          – Justin Cave
          Dec 10 '10 at 17:18





          @Benoit - Posted some updates.

          – Justin Cave
          Dec 10 '10 at 17:18


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f4401043%2foracle-text-will-not-work-with-nvarchar2-what-else-might-be-unavailable%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Bressuire

          Vorschmack

          Quarantine